Breast cancer splice variants

ABSTRACT

Provided herein, in some embodiments, are methods, compositions, and systems for identifying alternatively spliced tumor-specific exon inclusion and exclusion events that can be used for survival prognosis.

RELATED APPLICATIONS

This application is a divisional of U.S. application Ser. No.17/253,974, filed Dec. 18, 2020, which is a national stage filing under35 U.S.C. § 371 of international application number PCT/US2019/039794,filed Jun. 28, 2019, which claims the benefit under 35 U.S.C. § 119(e)of U.S. provisional application No. 62/692,121, filed Jun. 29, 2018, andU.S. provisional application No. 62/818,582, filed Mar. 14, 2019, eachof which is incorporated by reference herein in its entirety.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing(J022770014US03-SEQ-HJD.xml; Size: 235,625 bytes; and Date of Creation:Apr. 18, 2023) is herein incorporated by reference in its entirety.

BACKGROUND

Breast cancer survival rates indicate what portion of people with thesame type and stage of breast cancer are still alive a certain amount oftime (e.g., 5 years) after they are diagnosed. The extensiveheterogeneity of breast cancer, however, complicates a preciseassessment of prognosis, making therapeutic decisions difficult andtreatments inappropriate in some cases.

SUMMARY

Provided herein, in some aspects, is a molecular profiling platform thatmay be used, for example, to identify exon splicing events (e.g., exoninclusion or exon exclusion) that are specific to breast cancer and canbe used for survival prognosis. Alternative splicing is a biologicalphenomenon that increases protein diversity. In one type of alternativesplicing, referred to as “exon skipping,” exons are either spliced outof the transcript based on cellular conditions or are not spliced outbut instead remain in the transcript and are “skipped” over. Exonskipping events are regulated by RNA-binding proteins (RPBs) and thespliceosome complex. A common metric for evaluating the extent of exonskipping is percent spliced in (PSI or Ψ), which represents thepercentage of transcripts that include a specific exon or splice site.

Prior approaches for analyzing cancer tissue samples separately analyzeda group of normal samples (non-cancerous samples) and a group of cancersamples (samples known to be cancerous) to generate two distributions.Data in the non-overlapping parts of the two distributions would beanalyzed to assess the differences between the two groups of samples.Due to the heterogeneity of the biological data, where alternativesplicing can occur for reasons other than having cancer (e.g., exonskipping can occur naturally for non-cancerous (normal) healthypatients), the conventional “two-distribution” approach is not wellsuited to identifying exon skipping events that are predictive ofcancer.

The present disclosure provides, in some aspects, methods that combinethe analysis (e.g., PSI values) determined for normal and cancer tissuesamples and analyze the combined input using a probabilistic model (GMM)to identify subpopulations (clusters) within the overall population thatcan be further analyzed to assess whether they are cancer-specific. Someof the data described herein is based on an analysis of ˜9300 normal andtumor samples from The Cancer Genome Atlas (TCGA), which identified˜67,000 exon skipping events. From this data, a subset of exon splicingevents (e.g., exon inclusion or exon exclusion) specific to breastcancer was identified.

In some aspects, the present disclosure provides a method comprisingassaying nucleic acids of a sample for the presence or absence of atarget exon comprising a nucleotide sequence of any one of SEQ ID NOS:22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104. In some embodiments,the target exon comprises a nucleotide sequence of any one of SEQ IDNOS: 27, 98, 102, or 104.

In other aspects, the present disclosure provides a method comprisingassaying nucleic acids of a sample for the presence or absence of atleast 2 target exons, wherein each target exon comprises a nucleotidesequence of any one of SEQ ID NOS: 23, 27, 35, 85, 88, 89, 98, 101, 102,or 104. In some embodiments, each target exon comprises a nucleotidesequence of any one of SEQ ID NOS: 27, 98, 101, 102, or 104.

In yet other aspects, the present disclosure provides a methodcomprising assaying nucleic acids of a sample for the presence orabsence of at least 3 target exons, wherein each target exon comprises anucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32,35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104.

In still further aspects, the present disclosure provide a methodcomprising assaying nucleic acids of a sample for the presence orabsence of at least 8 different target exons, wherein each target exoncomprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or73-104.

In some embodiments, the sample is a breast tissue sample. For example,the sample may be obtained from a subject suspect of having, at risk of,or diagnosed with breast cancer. In some embodiments, the subject is afemale subject.

In some embodiments, the nucleic acids comprise messenger ribonucleicacid (mRNA), or complementary deoxyribonucleic acid (cDNA) synthesizedfrom mRNA obtained from the sample.

In some embodiments, the methods further comprise detecting the presenceof a target exon comprising a nucleotide sequence of any one of SEQ IDNOs: 24, 28, 31, 33, and/or 38 or the absence of a target exoncomprising a nucleotide sequence of any one of SEQ ID NOs: 82, 87 and/or91, and assigning a favorable survival prognosis to the sample. In someembodiments, the methods further comprise detecting the presence of atarget exon comprising a nucleotide sequence of any one of SEQ ID NOs:21-23, 25-27, 29, 30, 32, and/or 34-40 or the absence of a target exoncomprising a nucleotide sequence of any one of SEQ ID NOs: 73-81, 83-86,88-90, and/or 92-104, and assigning an unfavorable survival prognosis tothe sample.

Also provided herein are complementary deoxyribonucleic acids (cDNAs)comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or105-136. In some embodiments, the cDNAs comprise a nucleotide sequenceof any one of SEQ ID NOs: 22-24, 27-34, 36, 38, or 40. Compositionscomprising the cDNAs are also contemplated herein. In some embodiments,the compositions further comprise a probe or pair of primers that bindsthe cDNA. Some compositions of the present disclosure comprise (a) amessenger ribonucleic acid (mRNA) comprising a nucleotide sequence ofany one of SEQ ID NOs: 1-20 or 105-136 and (b) a probe or a pair ofprimers that binds a nucleotide sequence of any one of SEQ ID NOs: 1-20or 105-136. In some embodiments, the probe or primer comprises adetectable label.

Further provided herein are kits comprising a molecule that can detectthe presence or absence of a target exon comprising a nucleotidesequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79,82-100, 102-104, and a detection reagent selected from buffers, salts,polymerases, and deoxyribonucleotide triphosphates (dNTPs). In someembodiments, the molecule comprise a probe or primer that bind a nucleicacid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24,26-36, 38-40, 73-75, 77-79, 82-100, 102-104.

Also provided herein are kits comprising: (a) molecules that can detectthe presence or absence of at least 2 target exons, wherein each targetexon comprises a nucleotide sequence of any one of SEQ ID NOS: 23, 27,35, 85, 88, 89, 98, 101, 102, or 104, (b) molecules that can detect thepresence or absence of at least 3 target exons, wherein each target exoncomprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27,30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104, or (c)molecules that can detect the presence or absence of at least 8different target exons, wherein each target exon comprises a nucleotidesequence of any one of SEQ ID NOs: 21-40 or 73-104, and a detectionreagent selected from buffers, salts, polymerases, anddeoxyribonucleotide triphosphates (dNTPs). In some embodiments, at leastone of the probes and/or primers comprises a detectable label.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A: Alternative splicing leads to target exon inclusion or exonexclusion in cancer patients when compared to normal tissues. FIG. 1B:Frequency of exon splicing events (e.g., exon inclusion and exonexclusion) in TCGA patients. In total, 20 exon inclusion events and 32exon exclusion events that are breast cancer specific and associated tosurvival were detected using the novel Gaussian mixture modeling (GMM)clustering approach. The table indicates the presence or absence of the52 exon splicing events (rows) across 824 breast cancer patients in TCGA(columns). Exon splicing events are ordered by frequency. Unfavorableand favorable prognosis are shown, respectively.

FIG. 2A: Frequency (%) of detection for the list of 52-exon splicingevents in the TCGA cohort with survival information (n=824, above). FIG.2B: Type of exon splicing biomarker detected in patients using the52-exon splicing biomarker panel.

FIG. 3A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 1446 (CCDC115 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 3B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 1446(CCDC115 gene). Cluster 4 is composed mostly of breast cancer samples.FIG. 3C: Exon levels (PSI) for tumor specific cluster C4 and normaltissues in TCGA. This analysis indicates that the target exon (alsoreferred to herein as an “alternative exon”) is expressed in 97 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 3D: Survival analysis of breast cancer patients in clusterC4 versus the remaining breast cancer patients in TCGA. This analysisindicates that patients in C4 (expressing the target exon) have a worseoverall survival (shorter survival time, days).

FIG. 4A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 13343 (ENAH gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 4B: Frequency (%) of tumor and normal samples across the 3 clustersidentified for the splicing event 13343 (ENAH gene). Cluster 3 iscomposed mostly of breast cancer samples. FIG. 4C: Exon splicing levels(PSI) for tumor specific cluster C3 and normal tissues in TCGA. Thisanalysis indicates that the target exon is expressed in 41 breast cancerpatients in cluster C3, while very low or absent in normal tissues. FIG.4D: Survival analysis of breast cancer patients in cluster C3 versus theremaining breast cancer patients in TCGA. This analysis indicates thatpatients in C3 (expressing the target exon) have a worse overallsurvival (shorter survival time, days).

FIG. 5A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 15088 (POLI gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 5B: Frequency (%) of tumor and normal samples across the 3 clustersidentified for the splicing event 15088 (POLI gene). Cluster 3 iscomposed mostly of breast cancer samples. FIG. 5C: Exon splicing levels(PSI) for tumor specific cluster C3 and normal tissues in TCGA. Thisanalysis indicates that the target exon is expressed in 100 breastcancer patients in cluster C3, while very low or absent in normaltissues. FIG. 5D: Survival analysis of breast cancer patients in clusterC3 versus the remaining breast cancer patients in TCGA. This analysisindicates that patients in C3 (expressing the exon) have a worse overallsurvival (shorter survival time, days).

FIG. 6A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 16864 (PLXNB1 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 6B: Frequency (%) of tumor and normal samples across the 4 clustersidentified for the splicing event 16864 (PLXNB1 gene). Cluster 4 iscomposed mostly of breast cancer samples. FIG. 6C: Exon splicing levels(PSI) for tumor specific cluster C4 and normal tissues in TCGA. Thisanalysis indicates that the target exon is expressed in 74 breast cancerpatients in cluster C4, while very low or absent in normal tissues. FIG.6D: Survival analysis of breast cancer patients in cluster C4 versus theremaining breast cancer patients in TCGA. This analysis indicates thatpatients in C4 (expressing the target exon) have a better overallsurvival (longer survival time, days).

FIG. 7A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 21181 (SH3GLB1 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 7B: Frequency (%) of tumor and normal samples across the 4 clustersidentified for the splicing event 21181 (SH3GLB1 gene). Cluster 4 iscomposed mostly of breast cancer samples. FIG. 7C: Exon splicing levels(PSI) for tumor specific cluster C4 and normal tissues in TCGA. Thisanalysis indicates that the target exon is expressed in 57 breast cancerpatients in cluster C4, while very low or absent in normal tissues. FIG.7D: Survival analysis of breast cancer patients in cluster C4 versus theremaining breast cancer patients in TCGA. This analysis indicates thatpatients in C4 (expressing the target exon) have a worse overallsurvival (shorter survival time, days).

FIG. 8A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 34793 (TCF25 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 8B: Frequency (%) of tumor and normal samples across the 4 clustersidentified for the splicing event 34793 (TCF25 gene). Cluster 4 iscomposed mostly of breast cancer samples. FIG. 8C: Exon splicing levels(PSI) for tumor specific cluster C4 and normal tissues in TCGA. Thisanalysis indicates that the target exon is expressed in 32 breast cancerpatients in cluster C4, while very low or absent in normal tissues. FIG.8D: Survival analysis of breast cancer patients in cluster C4 versus theremaining breast cancer patients in TCGA. This analysis indicates thatpatients in C4 (expressing the target exon) have a worse overallsurvival (shorter survival time, days).

FIG. 9A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 42420 (PRRS-ARHGAP8 gene). The GMM analysis showed 4distinct clusters (subpopulations). The x-axis indicates the exon PSI(w) level within samples, and y-axis denotes the number of samples in anormalized density scale. Shading indicates the cluster assignment ofeach sample. FIG. 9B: Frequency (%) of tumor and normal samples acrossthe 4 clusters identified for the splicing event 42420 (PRRS-ARHGAP8gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 9C:Exon splicing levels (PSI) for tumor specific cluster C3 and normaltissues in TCGA. This analysis indicates that the target exon isexpressed in 265 breast cancer patients in cluster C3, while very low orabsent in normal tissues. FIG. 9D: Survival analysis of breast cancerpatients in cluster C3 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C3 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 10A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 4322 (WDR45B gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (w) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 10B: Frequency (%) of tumor and normal samples across the 4clusters identified for the splicing event 4322 (WDR45B gene). Cluster 4is composed mostly of breast cancer samples. FIG. 10C: Exon splicinglevels (PSI) for tumor specific cluster C4 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 39 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 10D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea better overall survival (longer survival time, days).

FIG. 11A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 44438 (VPS29 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 11B: Frequency (%) of tumor and normal samples across the 4clusters identified for the splicing event 44438 (VPS29 gene). Cluster 4is composed mostly of breast cancer samples. FIG. 11C: Exon splicinglevels (PSI) for tumor specific cluster C4 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 54 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 11D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 12A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 48175 (E4F1 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 12B: Frequency (%) of tumor and normal samples across the 3clusters identified for the splicing event 48175 (E4F1 gene). Cluster 3is composed mostly of breast cancer samples. FIG. 12C: Exon splicinglevels (PSI) for tumor specific cluster C3 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 60 breastcancer patients in cluster C3, while very low or absent in normaltissues. FIG. 12D: Survival analysis of breast cancer patients incluster C3 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C3 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 13A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 49765 (TEN1-CDK3 gene). The GMM analysis showed 4distinct clusters (subpopulations). The x-axis indicates the exon PSI(Ψ) level within samples, and y-axis denotes the number of samples in anormalized density scale. Shading indicates the cluster assignment ofeach sample. FIG. 13B: Frequency (%) of tumor and normal samples acrossthe 4 clusters identified for the splicing event 49765 (TEN1-CDK3 gene).Cluster 4 is composed mostly of breast cancer samples. FIG. 13C: Exonsplicing levels (PSI) for tumor specific cluster C4 and normal tissuesin TCGA. This analysis indicates that the target exon is expressed in 58breast cancer patients in cluster C4, while very low or absent in normaltissues. FIG. 13D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea better overall survival (longer survival time, days).

FIG. 14A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 5134 (PLEKHA6 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 14B: Frequency (%) of tumor and normal samples across the 4clusters identified for the splicing event 5134 (PLEKHA6 gene). Cluster4 is composed mostly of breast cancer samples. FIG. 14C: Exon splicinglevels (PSI) for tumor specific cluster C4 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 70 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 14D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 15A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 56552 (GNAZ gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 15B: Frequency (%) of tumor and normal samples across the 4clusters identified for the splicing event 56552 (GNAZ gene). Cluster 4is composed mostly of breast cancer samples. FIG. 15C: Exon splicinglevels (PSI) for tumor specific cluster C4 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 33 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 15D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea better overall survival (longer survival time, days).

FIG. 16A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 5696 (TTC3 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 16B: Frequency (%) of tumor and normal samples across the 3clusters identified for the splicing event 5696 (TTC3 gene). Cluster 3is composed mostly of breast cancer samples. FIG. 16C: Exon splicinglevels (PSI) for tumor specific cluster C3 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 31 breastcancer patients in cluster C3, while very low or absent in normaltissues. FIG. 16D: Survival analysis of breast cancer patients incluster C3 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C3 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 17A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 57139 (RNF8 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 17B: Frequency (%) of tumor and normal samples across the 2clusters identified for the splicing event 57139 (RNF8 gene). Cluster 2is composed mostly of breast cancer samples. FIG. 17C: Exon splicinglevels (PSI) for tumor specific cluster C2 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 80 breastcancer patients in cluster C2, while very low or absent in normaltissues. FIG. 17D: Survival analysis of breast cancer patients incluster C2 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C2 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 18A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 57874 (ZDHHC13 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 18B: Frequency (%) of tumor and normal samples across the 2clusters identified for the splicing event 57874 (ZDHHC13 gene). Cluster2 is composed mostly of breast cancer samples. FIG. 18C: Exon splicinglevels (PSI) for tumor specific cluster C2 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 58 breastcancer patients in cluster C2, while very low or absent in normaltissues. FIG. 18D: Survival analysis of breast cancer patients incluster C2 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C2 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 19A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 60615 (SH3GLB2 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 19B: Frequency (%) of tumor and normal samples across the 2clusters identified for the splicing event 60615 (SH3GLB2 gene). Cluster2 is composed mostly of breast cancer samples. FIG. 19C: Exon splicinglevels (PSI) for tumor specific cluster C2 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 37 breastcancer patients in cluster C2, while very low or absent in normaltissues. FIG. 19D: Survival analysis of breast cancer patients incluster C2 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C2 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 20A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 62560 (ITFG1 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicate the cluster assignment of each sample.FIG. 20B: Frequency (%) of tumor and normal samples across the 4clusters identified for the splicing event 62560 (ITFG1 gene). Cluster 4is composed mostly of breast cancer samples. FIG. 20C: Exon splicinglevels (PSI) for tumor specific cluster C4 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 53 breastcancer patients in cluster C4, while very low or absent in normaltissues. FIG. 20D: Survival analysis of breast cancer patients incluster C4 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C4 (expressing the target exon) havea better overall survival (longer survival time, days).

FIG. 21A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 6785 (SPATS2 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 21B: Frequency (%) of tumor and normal samples across the 2clusters identified for the splicing event 6785 (SPATS2 gene). Cluster 2is composed mostly of breast cancer samples. FIG. 21C: Exon splicinglevels (PSI) for tumor specific cluster C2 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 77 breastcancer patients in cluster C2, while very low or absent in normaltissues. FIG. 21D: Survival analysis of breast cancer patients incluster C2 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C2 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 22A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 8742 (DHRS11 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon PSI (Ψ) levelwithin samples, and y-axis denotes the number of samples in a normalizeddensity scale. Shading indicates the cluster assignment of each sample.FIG. 22B: Frequency (%) of tumor and normal samples across the 3clusters identified for the splicing event 8742 (DHRS11 gene). Cluster 3is composed mostly of breast cancer samples. FIG. 22C: Exon splicinglevels (PSI) for tumor specific cluster C3 and normal tissues in TCGA.This analysis indicates that the target exon is expressed in 44 breastcancer patients in cluster C3, while very low or absent in normaltissues. FIG. 22D: Survival analysis of breast cancer patients incluster C3 versus the remaining breast cancer patients in TCGA. Thisanalysis indicates that patients in C3 (expressing the target exon) havea worse overall survival (shorter survival time, days).

FIG. 23A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 1506 (CENPK gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 23B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 1506(CENPK gene). Clusters 1-4 are composed mostly of breast cancer samples.FIG. 23C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 37 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 23D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 24A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 2098 (METTL5 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 24B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 2098(METTL5 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 24C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 38 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 24D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 25A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 2242 (PLA2R1 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 25B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 2242(PLA2R1 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 25C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 45 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 25D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 26A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 7106 (RHOH gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 26B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 7106(RHOH gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 26C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 48 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 26D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 27A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 7108 (RHOH gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 27B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 7108(RHOH gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 27C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 44 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 27D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 28A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 9442 (QPRT gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 28B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 9442(QPRT gene). Clusters 1-2 are composed mostly of breast cancer samples.FIG. 28C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 40 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 28D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 29A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 10439 (IL17RB gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 29B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 10439(IL17RB gene). Clusters 1-2 are composed mostly of breast cancersamples. FIG. 29C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 53 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 29D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 30A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 11685 (STAU1 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 30B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 11685(STAU1 gene). Clusters 1-2 are composed mostly of breast cancer samples.FIG. 30C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 37 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 30D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 31A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 13451 (LYRM1 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 31B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 13451(LYRM1 gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 31C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 34 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 31D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 32A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 14574 (PPARG gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 32B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 14574(PPARG gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 32C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 33 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 32D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a better overall survival (longer survival time, days).

FIG. 33A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 16269 (BORCS8-MEF2B gene). The GMM analysis showed 3distinct clusters (subpopulations). The x-axis indicates the exonpercent spliced in (PSI, Ψ) level within samples, and y-axis denotes thenumber of samples in a normalized density scale. Shading indicates thecluster assignment of each sample. FIG. 33B: Frequency (%) of tumor andnormal samples across the 3 clusters identified for the splicing event16269 (BORCS8-MEF2B gene). Clusters 1-3 are composed mostly of breastcancer samples. FIG. 33C: Exon splicing levels (PSI) for tumor specificcluster C1 and normal tissues in TCGA. This analysis indicates that thetarget exon is expressed in 43 breast cancer patients in cluster C1,while very low or absent in normal tissues. FIG. 33D: Survival analysisof breast cancer patients in cluster C1 versus the remaining breastcancer patients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 34A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 16833 (ENOSF1 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 34B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 16833(ENOSF1 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 34C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 46 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 34D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 35A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 16929 (DHRS4-AS1 gene). The GMM analysis showed 3distinct clusters (subpopulations). The x-axis indicates the exonpercent spliced in (PSI, Ψ) level within samples, and y-axis denotes thenumber of samples in a normalized density scale. Shading indicates thecluster assignment of each sample. FIG. 35B: Frequency (%) of tumor andnormal samples across the 3 clusters identified for the splicing event16929 (DHRS4-AS1 gene). Clusters 1-2 are composed mostly of breastcancer samples. FIG. 35C: Exon splicing levels (PSI) for tumor specificcluster C1 and normal tissues in TCGA. This analysis indicates that thetarget exon is expressed in 83 breast cancer patients in cluster C1,while very low or absent in normal tissues. FIG. 35D: Survival analysisof breast cancer patients in cluster C1 versus the remaining breastcancer patients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 36A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 16943 (NDUFV2 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 36B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 16943(NDUFV2 gene). Clusters 1-4 are composed mostly of breast cancersamples. FIG. 36C: Exon splicing levels (PSI) for tumor specific clusterC3 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 58 breast cancer patients in cluster C3, while verylow or absent in normal tissues except bladder. FIG. 36D: Survivalanalysis of breast cancer patients in cluster C3 versus the remainingbreast cancer patients in TCGA. This analysis indicates that patients inC3 (expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 37A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 18745 (FER1L4 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 37B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 18745(FER1L4 gene). Clusters 1-4 are composed mostly of breast cancersamples. FIG. 37C: Exon splicing levels (PSI) for tumor specific clusterC2 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 89 breast cancer patients in cluster C2, while verylow or absent in normal tissues. FIG. 37D: Survival analysis of breastcancer patients in cluster C2 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C2(expressing the target exon) have a better overall survival (longersurvival time, days).

FIG. 38A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 19824 (PHF14 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 38B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 19824(PHF14 gene). Clusters 1-2 are composed mostly of breast cancer samples.FIG. 38C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 111 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 38D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 39A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 19828 (PHF14 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 39B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 19828(PHF14 gene). Clusters 1-2 are composed mostly of breast cancer samples.FIG. 39C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 111 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 39D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 40A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 21024 (BCL2L13 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 40B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 21024(BCL2L13 gene). Clusters 1-4 are composed mostly of breast cancersamples. FIG. 40C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 35 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 40D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 41A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 22227 (SELENBP1 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 41B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 22227(SELENBP1 gene). Clusters 1-2 are composed mostly of breast cancersamples. FIG. 41C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 86 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 41D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a better overall survival (longersurvival time, days).

FIG. 42A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 24742 (LINC00630 gene). The GMM analysis showed 3distinct clusters (subpopulations). The x-axis indicates the exonpercent spliced in (PSI, Ψ) level within samples, and y-axis denotes thenumber of samples in a normalized density scale. Shading indicates thecluster assignment of each sample. FIG. 42B: Frequency (%) of tumor andnormal samples across the 3 clusters identified for the splicing event24742 (LINC00630 gene). Clusters 1-3 are composed mostly of breastcancer samples. FIG. 42C: Exon splicing levels (PSI) for tumor specificcluster C2 and normal tissues in TCGA. This analysis indicates that thetarget exon is expressed in 38 breast cancer patients in cluster C2,while very low or absent in normal tissues except uterus. FIG. 42D:Survival analysis of breast cancer patients in cluster C2 versus theremaining breast cancer patients in TCGA. This analysis indicates thatpatients in C2 (expressing the target exon) have a worse overallsurvival (shorter survival time, days).

FIG. 43A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 27194 (CTBP2 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 43B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 27194(CTBP2 gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 43C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 33 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 43D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 44A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 30244 (SLC52A2 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 44B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 30244(SLC52A2 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 44C: Exon splicing levels (PSI) for tumor specific clusterC3 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 310 breast cancer patients in cluster C3, whilevery low or absent in normal tissues. FIG. 44D: Survival analysis ofbreast cancer patients in cluster C3 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C3(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 45A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 33377 (SLC38A1 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 45B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 33377(SLC38A1 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 45C: Exon splicing levels (PSI) for tumor specific clusterC2 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 52 breast cancer patients in cluster C2, while verylow or absent in normal tissues except stomach. FIG. 45D: Survivalanalysis of breast cancer patients in cluster C2 versus the remainingbreast cancer patients in TCGA. This analysis indicates that patients inC2 (expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 46A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 40521 (FAM65A gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 46B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 40521(FAM65A gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 46C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 32 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 46D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 47A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 41168 (USP25 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 47B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 41168(USP25 gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 47C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 31 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 47D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 48A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 45885 (HMOX2 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 48B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 45885(HMOX2 gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 48C: Exon splicing levels (PSI) for tumor specific cluster C2 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 151 breast cancer patients in cluster C2, while very low orabsent in normal tissues. FIG. 48D: Survival analysis of breast cancerpatients in cluster C2 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C2 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 49A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 50148 (MKRN2OS gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 49B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 50148(MKRN2OS gene). Clusters 1-4 are composed mostly of breast cancersamples. FIG. 49C: Exon splicing levels (PSI) for tumor specific clusterC2 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 40 breast cancer patients in cluster C2, while verylow or absent in normal tissues. FIG. 49D: Survival analysis of breastcancer patients in cluster C2 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C2(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 50A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 52249 (ATP8A2P1 gene). The GMM analysis showed 2 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 50B: Frequency (%) of tumor and normalsamples across the 2 clusters identified for the splicing event 52249(ATP8A2P1 gene). Clusters 1-2 are composed mostly of breast cancersamples. FIG. 50C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 33 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 50D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 51A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 53188 (HIBCH gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 51B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 53188(HIBCH gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 51C: Exon splicing levels (PSI) for tumor specific cluster C1 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 129 breast cancer patients in cluster C1, while very low orabsent in normal tissues. FIG. 51D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 52A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 58853 (SLC35C2 gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 52B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 58853(SLC35C2 gene). Clusters 1-3 are composed mostly of breast cancersamples. FIG. 52C: Exon splicing levels (PSI) for tumor specific clusterC1 and normal tissues in TCGA. This analysis indicates that the targetexon is expressed in 40 breast cancer patients in cluster C1, while verylow or absent in normal tissues. FIG. 52D: Survival analysis of breastcancer patients in cluster C1 versus the remaining breast cancerpatients in TCGA. This analysis indicates that patients in C1(expressing the target exon) have a worse overall survival (shortersurvival time, days).

FIG. 53A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 59314 (TRIMS gene). The GMM analysis showed 3 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 53B: Frequency (%) of tumor and normalsamples across the 3 clusters identified for the splicing event 59314(TRIMS gene). Clusters 1-3 are composed mostly of breast cancer samples.FIG. 53C: Exon splicing levels (PSI) for tumor specific cluster C2 andnormal tissues in TCGA. This analysis indicates that the target exon isexpressed in 61 breast cancer patients in cluster C2, while very low orabsent in normal tissues. FIG. 53D: Survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA. This analysis indicates that patients in C1 (expressing the targetexon) have a worse overall survival (shorter survival time, days).

FIG. 54A: GMM analysis of mixed normal and breast cancer samples for thesplicing event 60239 (HSD17B6 gene). The GMM analysis showed 4 distinctclusters (subpopulations). The x-axis indicates the exon percent splicedin (PSI, Ψ) level within samples, and y-axis denotes the number ofsamples in a normalized density scale. Shading indicates the clusterassignment of each sample. FIG. 54B: Frequency (%) of tumor and normalsamples across the 4 clusters identified for the splicing event 60239(HSD17B6 gene). Clusters 1-4 are composed mostly of breast cancersamples. FIG. 54C: Exon splicing levels (PSI) for tumor specificclusters C2 and C3 and normal tissues in TCGA. This analysis indicatesthat the target exon is expressed in 130 breast cancer patients incluster C2 and 214 breast cancer patients in cluster C3 while being verylow or absent in normal tissues except breast. FIG. 54D: Survivalanalysis of breast cancer patients in cluster C1 versus the remainingbreast cancer patients in TCGA. This analysis indicates that patients inC1 (expressing the target exon) have a worse overall survival (shortersurvival time, days).

DETAILED DESCRIPTION

Alternative splicing is a key mechanism of biological diversity ineukaryotes because it allows multiple mRNA isoforms to be transcribedand translated from a single gene. The human genome includes more than20,000 genes; however, more than 95% of multi-exonic pre-mRNAs arealternatively spliced to generate nearly 200,000 isoforms. Thealternative splicing isoforms translated into proteins can have distinctor even opposing functions. Alternative splicing is involved in a widerange of biological processes, including immune cell maturation andprocessing.

Studies examining the cancer transcriptome have enabled unprecedentedinsight into cancer cell heterogeneity and generated novelclassifications. This progress has not yet fully translated intoclinical benefit. Isoforms as well as alterations in alternativesplicing are associated with numerous diseases and can contribute tocancer malignancy by regulating the expression of oncogenes and tumorsuppressors. Aberrant alternative splicing profiles can arise in cancerdue to mutations at the splice sites or splicing-regulatory elements,but can also reflect changes in splicing regulators. Recurrent mutationsin core splicing machinery are found in myeloid leukemia, as well as insporadic mutations in lung and breast cancer, suggesting thatalternative alterations play a key role in tumorigenesis. Alterations inalternative splicing result in the generation of a repertoire of novelisoforms in tumors that, together with fusion molecules, can be viewedas another class of neoantigens.

Provided herein, in some aspects, are methods that comprise assaying asample for a particular cancer isoform including or excluding aparticular exon. In some embodiments, a sample is assayed for multipleexon inclusion or exon exclusion isoforms as provided herein. The dataprovided by the present disclosure demonstrates that at least one offifty-two different exon inclusion or exon exclusion isoforms can bedetected in ˜91% of all breast cancer samples tested.

Methods of Detection

Some aspects of the present disclosure comprise assaying a sample for(the presence or absence of) a nucleic acid (e.g., an exon inclusionevent or an exon exclusion event) comprising a nucleotide sequence(e.g., an exon) of any one of SEQ ID NOS: 21-40 and 105-136. It shouldbe understood that the phrase “assaying a sample for a nucleic acidcomprising a nucleotide sequence of SEQ ID NO: X” encompasses assaying asample for the presence or absence of a nucleic acid that includes thefull length nucleotide sequence identified by SEQ ID NO: X (allnucleotides of SEQ ID NO: X); and the phrase also includes assaying asample for the presence or absence of a nucleic acid that includes afragment of the nucleotide sequence identified by SEQ ID NO: X. Thelength of the fragment is not limited and may be, for example, at least50, at least 60, at least 70, at least 80, at least 90, or at least 100nucleotides.

In some embodiments, the methods comprise assaying a sample for anucleic acid comprising the nucleotide sequence of SEQ ID NO: 21. Insome embodiments, the methods comprise assaying a sample for a nucleicacid comprising the nucleotide sequence of SEQ ID NO: 22. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 23. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 24. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 25. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 26. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 27. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 28. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 29. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 30. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 31. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 32. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 33. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 34. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 35. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 36. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 37. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 38. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 39. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 40. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 105. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 106. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 107. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 108. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 109. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 110. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 111. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 112. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 113. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 114. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 115. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 116. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 117. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 118. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 119. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 120. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 121. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 122. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 123. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 124. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 125. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 126. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 127. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 128. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 129. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 130. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 131. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 132. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 133. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 134. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 135. In someembodiments, the methods comprise assaying a sample for a nucleic acidcomprising the nucleotide sequence of SEQ ID NO: 136.

In some embodiments, methods of the present disclosure comprise assayinga sample for a (at least one) nucleic acid comprising a nucleotidesequence of any one of SEQ ID NOS: 22-24, 27-34, 36, 38, or 40. In someembodiments, the methods further comprise assaying the sample for anucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS:21, 25, 26, 35, 37, or 39.

In some embodiments, methods of the present disclosure comprise assayingthe sample for a nucleic acid comprising a nucleotide sequence of SEQ IDNO: 21, a nucleic acid comprising a nucleotide sequence of SEQ ID NO:22, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 24, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 25, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 26, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 27, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 28, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 29, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 30, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 31, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 32, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 33, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 34, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 35, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 36, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 37, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 38, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 39, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 40, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 105, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 106, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 107, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 108, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 109, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 110, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 111, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 112, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 113, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 114, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 115, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 116, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 117, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 118, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 119, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 120, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 121, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 122, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 123, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 124, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 125, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 126, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 127, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 128, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 129, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 130, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 131, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 132, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 133, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 134, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 135, anucleic acid comprising a nucleotide sequence of SEQ ID NO: 136.

In some embodiments, the methods of the present disclosure compriseassaying the sample for 2 (or at least 2) of the 52 exons (selected fromexons comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40and 105-136). In some embodiments, the methods of the present disclosurecomprise assaying the sample for 3 (or at least 3) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 4 (or at least 4) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 5 (or at least 5) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 6 (orat least 7) of the 52 exons. In some embodiments, the methods of thepresent disclosure comprise assaying the sample for 7 (or at least 7) ofthe 52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 8 (or at least 8) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 9 (or at least 9) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 10 (or at least 10) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 11(or at least 11) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 12 (or at least12) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 13 (or at least 13) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 14 (or at least 14) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 15 (or at least 15) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 16 (or at least 16) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 17(or at least 17) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 18 (or at least18) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 19 (or at least 19) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 20 (or at least 20) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 21 (or at least 21) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 22 (or at least 22) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 23(or at least 23) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 24 (or at least24) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 25 (or at least 25) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 26 (or at least 26) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 27 (or at least 27) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 28 (or at least 28) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 29(or at least 29) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for (or at least 30)of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 31 (or at least 31) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 32 (or at least 32) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 33 (or at least 33) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 34 (or at least 34) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 35(or at least 35) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 36 (or at least36) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 37 (or at least 37) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 38 (or at least 38) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 39 (or at least 39) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 40 (or at least 40) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 41(or at least 41) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 42 (or at least42) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 43 (or at least 43) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 44 (or at least 44) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 45 (or at least 45) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 46 (or at least 46) of the 52 exons. In some embodiments, themethods of the present disclosure comprise assaying the sample for 47(or at least 47) of the 52 exons. In some embodiments, the methods ofthe present disclosure comprise assaying the sample for 48 (or at least48) of the 52 exons. In some embodiments, the methods of the presentdisclosure comprise assaying the sample for 49 (or at least 49) of the52 exons. In some embodiments, the methods of the present disclosurecomprise assaying the sample for 50 (or at least 50) of the 52 exons. Insome embodiments, the methods of the present disclosure compriseassaying the sample for 51 (or at least 51) of the 52 exons. In someembodiments, the methods of the present disclosure comprise assaying thesample for 52 exons.

It should be understood that a method “comprising assaying the samplefor fifty-two (52) exon splicing isoforms (e.g., exon inclusion or exonexclusion, each comprising a different nucleotide sequence of SEQ IDNOS: 21-40 and 105-136” is a method that comprises assaying for all 52isoforms provided in Table 1,Table 2 and Table 3.

Not every sample will have more than one exon splicing isoform (e.g.,exon inclusion or exon exclusion) of the present disclosure. In manyembodiments, only one of the exon splicing isoforms of the presentdisclosure will be detected in a sample. Nonetheless, a sample may beassayed for one or more (e.g., 1 to 52) of the 52 exon splicingisoforms. For example, a single sample may include only the exonsplicing isoform comprising the sequence of SEQ ID NO:1 or SEQ ID NO:21. All 52 or a subset of the 52 (less than 52) of the exon splicingisoforms of Table 1, Table 2, and Table 3 may be assayed in order todetect that exon splicing isoform comprising the sequence of SEQ ID NO:1or SEQ ID NO: 21

It should also be understood that the step of “assaying for an exonsplicing isoform(s) (e.g., exon inclusion or exon exclusion)” or“assaying for a nucleic acid” encompasses assaying for mRNA comprisingthe exon splicing isoform(s) or assaying for complementary DNA (cDNA)comprising the exon splicing isoform(s) (e.g., comprising the sequenceof any one of SEQ ID NOS: 21-40 and 105-136). As is known in the art,cDNA is synthesized from mRNA.

Examples of Nucleic Acid Detection Assays

There are many different known methods for assaying a sample for thepresence or absence of a particular nucleotide sequence, any of whichmay be used in accordance with the present disclosure. For example,standard polymerase chain reaction (PCR) methods (e.g., reversetranscription PCR (RT-PCR)) may be performed using mRNA obtained from asample. In RT-PCR, the RNA template is first converted into acomplementary DNA (cDNA) using a reverse transcriptase. The cDNA is thenused as a template for exponential amplification using PCR. Thus, kitsprovided herein may include any one or more reagents used in a PCR suchas, for example, primers or probes that bind to a particular nucleicacid comprising an exon splicing event (e.g., exon inclusion or exonexclusion), polymerases, buffers, deoxyribonucleotide triphosphates(dNTPs), and salts.

In some embodiments, an Archer® FusionPlex® assay is used to assay for anucleotide sequence (e.g., exon). This assay may include using customdesigned probes with and an Anchored Multiplexed PCR (AMP™) following bynext generation sequencing (NGS) (e.g., with an Illumina® platform).Thus, kits provided herein may include any one or more reagents used ina Archer® FusionPlex® assay.

In other embodiments, targeted sequencing using long-read sequencingtechnology (e.g., PacBio®, built on Single Molecule, Real-Time (SMRT)Sequencing technology,) is used to assay for a nucleotide sequence(e.g., exon). Thus, kits provided herein may include any one or morereagents used in a long-read sequencing technology.

In other embodiments, Droplet Digital™ PCR (ddPCR™) (BioRad®) is used toassay for a nucleotide sequence (e.g., exon). For example, combinationsof primers and probes may be designed to detect selected exon splicingisoforms in single cell suspension or in cells isolated from frozentumor tissues, e.g., using Laser Capture Microdissection. More than oneisoform may be detected in the single cell, for example. Thus, kitsprovided herein may include any one or more reagents used in a DropletDigital™ PCR (ddPCR™) assay.

In yet other embodiments, ViewRNA™ In Situ Hybridization (ISH) (ThermoFisher Scientific) may be used to assay for a nucleotide sequence (e.g.,exon). For example, splice junction probes may be designed to enablespecific detection of the exon splicing isoforms of the presentdisclosure in tissue sections (e.g., breast cancer tissue sections)through Fluorescent In Situ Hybridization (FISH). More than one isoformmay be detected in the same cell, for example. Thus, kits providedherein may include any one or more reagents used in an ISH assay.

In still other embodiments, nCounter® technology (nanoString™) is usedto assay for a nucleotide sequence (e.g., exon). For example, thenCounter® Analysis System utilizes a novel digital barcode technologyfor direct multiplexed measurement of analytes and offers high levels ofprecision and sensitivity (<1 copy per cell). The technology usesmolecular “barcodes” and single molecule imaging for the directhybridization and detection of hundreds of unique transcripts in asingle reaction. Each color-coded barcode is attached to a singletarget-specific probe corresponding to an analyte (e.g., exon) ofinterest. Combined together with invariant controls, the probes form amultiplexed CodeSet. Thus, kits provided herein may include any one ormore reagents used in a nCounter® assay or other nanoString™ nucleicacid detection assay.

Other nucleic acid detection methods may be used.

Probes

Some aspects of the present disclosure comprise assaying a sample forthe presence or absence of a nucleic acid (e.g., an exon inclusionevent) comprising a nucleotide sequence of any one of SEQ ID NOS: 1-20,each of which include an exon inclusion event as well as a sequencedirectly upstream from and a sequence directly downstream from the exoninclusion event (any one of SEQ ID NOS: 21-40). Some aspects of thepresent disclosure comprise assaying a sample for the presence orabsence of a nucleic acid (e.g., an exon exclusion event) comprising anucleotide sequence of any one of SEQ ID NOS: 105-136, each of whichinclude an exon exclusion event as well as a sequence directly upstreamfrom and a sequence directly downstream from the exon exclusion event(any one of SEQ ID NOS: 41-72).

A probe is a synthetic (non-naturally-occurring) nucleic acid that iswholly or partially complementary to and thus binds to a nucleic acid ofinterest (e.g., a nucleic acid comprising or comprised within anucleotide sequence of any one of SEQ ID NOS: 1-20,21-40, 41-72, or105-136). In some embodiments, a probe comprises DNA. In someembodiments, a probe comprises RNA. In some embodiments, a probecomprise DNA and RNA. It should be understood that the term “probe”encompasses “primer,” which, as is known in the art, is a syntheticnucleic acid (e.g., DNA) used as a starting point for nucleic acid(e.g., DNA) synthesis. The length of a probe may vary, depending on thenucleic acid detection assay being used. For example, a probe may have alength of at least 15, at least 18, at least 20, at least 25, at least30, at least 40, at least 50, at least 60, at least 70, at least 80, atleast 90, or at least 100 nucleotides. In some embodiments, a probe hasa length of 15 to 30 nucleotides, 15 to 50 nucleotides, or 15 to 100nucleotides. Depending on the application, a probe may be longer than100 nucleotides.

In some embodiments, one or more probe is designed to bind directly toan exon (e.g., exon inclusion event or exon exclusion event) of any oneof SEQ ID NOS: 21-40 and 105-136. The probe may bind, for example, to a5′ region, a central region, or a 3′ region of an exon.

In some embodiments, one or more probe is designed to bind to anucleotide sequence directly upstream (5′) from an exon of any one ofSEQ ID NOS: 21-40 and 105-136. In other embodiments, one or more probeis designed to bind to nucleotide sequence directly downstream (3′) froman exon of any one of SEQ ID NOS: 21-40 and 105-136. In someembodiments, a first probe (e.g., primer) of a pair of probes isdesigned to bind to nucleotide sequence directly upstream (5′) from anexon of any one of SEQ ID NOS: 21-40 and 105-136, and a second probe(e.g., primer) of the pair of probes is designed to bind to nucleotidesequence directly downstream (3′) from an exon of any one of SEQ ID NOS:21-40 and 105-136 such that the pair of probes flank the exon.

In some embodiments, one or more probe is designed to bind to an exonjunction. An exon junction comprises (a) nucleotide sequence thatincludes a 5′ region of an exon (e.g., of any one of SEQ ID NOS: 21-40and 105-136) and nucleotide sequence directly upstream from the 5′region of the exon, or (b) nucleotide sequence that includes a 3′ regionof an exon (e.g., of any one of SEQ ID NOS: 21-40 and 105-136) andnucleotide sequence directly downstream from the 3′ region of the exon.Table 6 provides examples of cDNA sequences that include exon inclusionevents (underlined) as well as sequences directly upstream from anddownstream from the exon inclusion event. Any one or more probe may bedesigned to bind to any region of a nucleotide sequence of Table 6 (SEQID NOS: 1-20), e.g., for the purpose of detecting (e.g., amplifying orlabeling) the nucleotide sequence in a sample. Table 7 provides examplesof cDNA sequences that include exon exclusion events (underlined) aswell as sequences directly upstream from and downstream from the exonexclusion event. Any one or more probe may be designed to bind to anyregion of a nucleotide sequence of Table 7 (SEQ ID NOS: 41-72), e.g.,for the purpose of detecting (e.g., amplifying or labeling) thenucleotide sequence in a sample.

Tissue Samples

In some embodiments, the mRNA is obtained from a biological sample.Biological samples include tissue samples or fluid samples. Non-limitingexamples of tissue samples include blood samples and breast tissuesamples. Non-limiting examples of fluid samples include cerebrospinalfluid (CSF) samples and urine samples.

In some embodiments, the mRNA is obtained from a breast tissue sample.The breast tissue sample, in some embodiments, is obtained from a femalesubject (e.g., human female subject), although it may alternatively beobtained from a male subject (e.g., human male subject).

In some embodiments, the sample is obtained from a subject diagnosedwith a cancer, such as breast cancer. For example, the subject may have,may be at risk of having, or may be suspected of having a cancer of abreast duct, breast lobule, or breast tissue in between the duct andlobule. Non-limiting examples of breast cancer that may be sampledinclude ductal carcinoma in situ, invasive ductal carcinoma, tubularcarcinoma of the breast, medullary carcinoma of the breast, mucinouscarcinoma of the breast, papillary carcinoma of the breast, cribriformcarcinoma of the breast, invasive lobular carcinoma, inflammatory breastcancer, Paget's disease of the nipple, Phyllodes tumors of the breast,metastatic breast cancer, and triple negative breast cancer (TNBC).

Applications

Methods of the present disclosure, in some embodiments, compriseassigning a favorable prognosis or unfavorable prognosis to a cancerpatient, based on the presence of a nucleic acid in the sample (e.g., anexon inclusion event or an exon exclusion) comprising a nucleotidesequence (e.g., an exon) of any one of SEQ ID NOS: 21-40 and 105-136.Thus, in some embodiments, methods herein comprise obtaining a samplefrom a subject, assaying the sample for a nucleic acid comprising anucleotide sequence of any one of SEQ ID NOS: 21-40 and 105-136, andassigning a favorable prognosis or unfavorable prognosis to thesample/patient (e.g., breast tissue sample) (see, e.g., Table 4 or Table5). In some embodiments, a nucleic acid comprising a nucleotide sequenceof any one of SEQ ID NOS: 21-40 or 105-136 is detected in the sampleobtained from the patient.

In some embodiments, a favorable prognosis is assigned to the samplewhen a nucleic acid comprising a nucleotide sequence of any one of SEQID NOS: 24, 28, 31, 33, 38, 114, 119, or 123 is detected. In someembodiments, a favorable prognosis is an at least 70% probability ofsurviving at least 2000 days. In some embodiments, a favorable prognosisis an at least 75% probability of surviving at least 2000 days. In someembodiments, a favorable prognosis is an at least 70% probability ofsurviving at least 4000 days. In some embodiments, a favorable prognosisis an at least 75% probability of surviving at least 4000 days.

In other embodiments, an unfavorable prognosis is assigned to the samplewhen a nucleic acid comprising a nucleotide sequence of any one of SEQID NOS: 21-27, 29, 30, 32, 34-37, 39, 40, 105-113, 115-118, 120-122, or124-136 is detected. In some embodiments, an unfavorable prognosis is anat least 75% probability of surviving less than 2000 days.

Additional Embodiments

-   -   1. A complementary deoxyribonucleic acid (cDNA) comprising a        nucleotide sequence of any one of SEQ ID NOS: 22-24, 27-34, 36,        38, or 40.    -   2. A composition comprising the cDNA of paragraph 1.    -   3. A composition comprising at least two cDNAs of paragraph 1.    -   4. The composition of paragraph 2 or 3 further comprising a cDNA        comprising a nucleotide sequence of any one of SEQ ID NOS: 21,        25, 26, 35, 37, or 39.    -   5. The composition of paragraph 2 or 4 comprising a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 22, a        nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23,        a nucleic acid comprising a nucleotide sequence of SEQ ID NO:        24, a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 26, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 28, a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 29, a nucleic acid comprising a        nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising        a nucleotide sequence of SEQ ID NO: 31, a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 33, a        nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34,        a nucleic acid comprising a nucleotide sequence of SEQ ID NO:        35, a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 37, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 39, and a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 40.    -   6. The composition of paragraph 2 further comprising a probe        that binds to the cDNA, or a pair of primers that bind to the        cDNA.    -   7. The composition of any one of paragraphs 2-6, wherein the        cDNA is synthesized from messenger ribonucleic acid (mRNA)        obtained from a tissue sample, optionally a breast tissue        sample.    -   8. The composition of paragraph 7, wherein the breast tissue        sample is obtained from a female subject.    -   9. The composition of paragraph 7 or 8, wherein the sample is        obtained from a subject diagnosed with a cancer.    -   10. The composition of paragraph 7 or 8, wherein the sample is        obtained from a subject at risk of having a cancer or suspected        of having a cancer.    -   11. A method comprising assaying a sample for a nucleic acid        comprising a nucleotide sequence of any one of SEQ ID NOS:        22-24, 27-34, 36, 38, or 40.    -   12. The method of paragraph 11 further comprising assaying the        sample for a nucleic acid comprising a nucleotide sequence of        any one of SEQ ID NOS: 21, 25, 26, 35, 37, or 39.    -   13. The method of paragraph 11 comprising assaying the sample        for a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 21, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 22, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 23, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 24, a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 25, a nucleic acid comprising a        nucleotide sequence of SEQ ID NO: 26, a nucleic acid comprising        a nucleotide sequence of SEQ ID NO: 27, a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 28, a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 29, a        nucleic acid comprising a nucleotide sequence of SEQ ID NO: 30,        a nucleic acid comprising a nucleotide sequence of SEQ ID NO:        31, a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 32, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 33, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 34, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 35, a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 36, a nucleic acid comprising a        nucleotide sequence of SEQ ID NO: 37, a nucleic acid comprising        a nucleotide sequence of SEQ ID NO: 38, a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 39, and a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 40.    -   14. The method of any one of paragraphs 11-13, wherein the        nucleic acid is a messenger ribonucleic acid (mRNA), optionally        obtained from a breast tissue sample.    -   15. The method of any one of paragraphs 11-13, wherein the        nucleic acid is a complementary deoxyribonucleic acid (cDNA)        synthesized from mRNA obtained from a breast tissue sample.    -   16. The method of paragraph 14 or 15, wherein the breast tissue        sample is obtained from a female subject.    -   17. The method of any one of paragraphs 14-16, wherein the        breast tissue sample is obtained from a subject diagnosed with a        cancer.    -   18. The method of any one of paragraphs 14-16, wherein the        breast tissue sample is obtained from a subject at risk of        having a cancer or suspected of having a cancer.    -   19. The method of any one of paragraphs 11-18 further comprising        detecting a nucleic acid comprising a nucleotide sequence of any        one of SEQ ID NOS: 21-40.    -   20. The method of any one of paragraphs 11-19, wherein the        nucleic acid is a mRNA.    -   21. The method of any one of paragraphs 11-19, wherein the        nucleic acid is a cDNA.    -   22. The method of any one of paragraphs 19-21 further comprising        assigning to the subject from whom the sample was obtained a        favorable prognosis or an unfavorable prognosis.    -   23. The method of paragraph 22, wherein a favorable prognosis is        assigned to the subject from whom the sample was obtained if a        nucleic acid comprising a nucleotide sequence of any one of SEQ        ID NOS: 24, 28, 21, 33, or 38 is detected.    -   24. The method of paragraph 22, wherein an unfavorable prognosis        is assigned to the subject from whom the sample was obtained if        a nucleic acid comprising a nucleotide sequence of any one of        SEQ ID NOS: 21-27, 29, 30, 32, 34-37, 39, or 40 is detected.    -   25. A method comprising:    -   obtaining a sample from a subject;    -   assaying the sample for a nucleic acid comprising a nucleotide        sequence of any one of SEQ ID NOS: 21-40; and    -   assigning a favorable prognosis or unfavorable prognosis to the        subject.    -   26. The method of paragraph 25 further comprising detecting in        the sample a nucleic acid comprising a nucleotide sequence of        any one of SEQ ID NOS: 21-40.    -   27. The method of paragraph 26, wherein the sample is a breast        tissue sample.    -   28. The method of any one of paragraphs 25-27, wherein the        assaying step comprising assaying the sample for a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 22, a        nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23,        a nucleic acid comprising a nucleotide sequence of SEQ ID NO:        24, a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 26, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 28, a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 29, a nucleic acid comprising a        nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising        a nucleotide sequence of SEQ ID NO: 31, a nucleic acid        comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic        acid comprising a nucleotide sequence of SEQ ID NO: 33, a        nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34,        a nucleic acid comprising a nucleotide sequence of SEQ ID NO:        35, a nucleic acid comprising a nucleotide sequence of SEQ ID        NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ        ID NO: 37, a nucleic acid comprising a nucleotide sequence of        SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence        of SEQ ID NO: 39, and a nucleic acid comprising a nucleotide        sequence of SEQ ID NO: 40.    -   28. The method of any one of paragraphs 25-27, wherein a        favorable prognosis is assigned to the subject from whom the        sample was obtained if a nucleic acid comprising a nucleotide        sequence of any one of SEQ ID NOS: 24, 28, 21, 33, or 38 is        detected.    -   29. The method of any one of paragraphs 25-27, wherein an        unfavorable prognosis is assigned to the subject from whom the        sample was obtained if a nucleic acid comprising a nucleotide        sequence of any one of SEQ ID NOS: 21-27, 29, 30, 32, 34-37, 39,        or 40 is detected.    -   30. A kit comprising: a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of any one of SEQ ID NOS:        1-20; and at least one reagent for detecting a nucleic acid        selected from buffers, salts, polymerases, and        deoxyribonucleotide triphosphates (dNTPs).    -   31. A kit comprising:    -   a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 1, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 2, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 3, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 4, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO: 5,        a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 6, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 7, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 8, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 9, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        10, a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 11, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 12, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 13, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 14, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        15, a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 16, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 17, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 18, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 19, and a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        20.    -   32. A kit comprising:    -   a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 21, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 22, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 23, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 24, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        25, a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 26, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 27, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 28, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 29, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        30, a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 31, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 32, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 33, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 34, a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        35, a probe comprising a nucleotide sequence complementary to a        nucleotide sequence of SEQ ID NO: 36, a probe comprising a        nucleotide sequence complementary to a nucleotide sequence of        SEQ ID NO: 37, a probe comprising a nucleotide sequence        complementary to a nucleotide sequence of SEQ ID NO: 38, a probe        comprising a nucleotide sequence complementary to a nucleotide        sequence of SEQ ID NO: 39, and a probe comprising a nucleotide        sequence complementary to a nucleotide sequence of SEQ ID NO:        40.    -   33. The kit of paragraph 31 or 32, wherein the kit further        comprises at least one reagent for detecting a nucleic acid        selected from buffers, salts, polymerases, and        deoxyribonucleotide triphosphates (dNTPs).

EXAMPLES Example 1

Alternative splicing is a biological phenomenon that increasestranscript and protein diversity. In one type of alternative splicing,referred to as “exon skipping,” exons are either spliced “in” or spliced“out” of the transcript based on cellular conditions (FIG. 55 ).

Due to alternative splicing, different transcript isoforms (exonconfigurations) of the same gene might be expressed in tumor and normalsamples. Therefore, even though a gene is expressed in both tumor andnormal tissues, transcripts might harbor an exon configuration that isdistinctive to cancer.

A conventional approach for identification of cancer biomarkers is basedon gene expression. Researchers aim to detect whether a gene isspecifically expressed in tumors using microarrays or RNA sequencing. Wetook a splicing-based approach rather than a gene-based approach toidentify cancer biomarkers.

Methods

To identify splicing biomarkers in cancer, we took the steps outlinedbelow, i.e., (i) Transcript sequencing, (ii) TCGA analysis, and (iii)Clustering analysis using a novel methodology to identify splicing-basedbiomarkers.

Sequencing: Long read sequencing using PacBio® Single Molecule Real TimeSequencing (SMRT) technology. This technology is capable of sequencingfull-length cDNA transcripts without the need of cDNA fragmentation, andtherefore can be used to directly infer the connectivity of exons intranscripts without the need of computational reconstruction. We usedthis technology to sequence transcripts in 81 cancer and tumor samples.We obtained 298K transcripts corresponding to ˜14K genes, yielding amedian of 8 30 isoforms per gene. This represents a ˜2-fold increaseover the public human reference transcriptome (Gencode version 25) forthose set of genes. This set of transcripts is called PacBio®Transcriptome.

Data Analysis Step 1, TCGA analysis: Quantification of exon skippingevents in a large cohort of breast cancer patients available from TCGAusing the PacBio®

Transcriptome as background. The aim of the step is to compute percentspliced-in (PSI) for exons undergoing alternative splicing. This stepwas performed using the rMATS software. rMATS identified 67,255 skippingevents in the PacBio® transcriptome, and computed the PSI levels foreach of those exons across all samples (n=1,748, including 1,111 breastcancer tumors and 637 normal). Given the size of the TCGA sequencingdata, this step was performed using the ISB Cancer Genomics Cloud(Google Cloud) platform.

Data Analysis Step 2, Clustering: Apply a methodology of the presentdisclosure called ts3 (Tumor Specific Splice Site Detection) to findexons that are included (e.g., spliced in) and excluded (spliced out)only in cancer (FIG. 55 ). This is accomplished by using a clusteringapproach based on GMM.

Results

We applied our methodology based on Gaussian mixture modeling toidentify exon splicing events specific to breast cancer patients fromthe TCGA cohort. As a result, we identified 20 exon inclusion events(spliced “in” exons) that are specifically expressed in cancer and haveprognosis power. These exon inclusion events have the followingproperties:

-   -   Target exon has increased PSI levels (expression) compared to        normal tissues (PSI_(tumor)−PSI_(normal)>10%),    -   Target exon is low or absent in normal tissues        (PSI_(normal)<5%),    -   Splicing event is reliably detected in at least 30 breast cancer        patients (coverage of at least 10 RNA-Seq reads in each        patient),    -   Patients harboring these exon inclusion events have favorable or        unfavorable survival prognosis (p<0.05, logrank test).

We also identified 32 exon exclusion events (spliced “out” exons) thatare specific to breast cancer and have prognosis power. These exonexclusion events have the following properties:

-   -   Target exon has decreased PSI levels (expression) compared to        normal tissues (PSI_(tumor)−PSI_(normal)>−10%),    -   Target exon is high in normal tissues (PSI_(normal)>95%),    -   Splicing event is reliably detected in at least 30 breast cancer        patients (coverage of at least 10 RNA-Seq reads in each        patient),    -   Patients harboring these exon exclusion events have favorable or        unfavorable survival prognosis (p<0.05, logrank test).

Because they are specific to cancer, these exon events are referred toas “exon inclusion biomarkers or exon exclusion biomarkers.”

The exon splicing sequences were identified using long read SMRT PacBio®sequencing (see, e.g., Rhoads A et al. Genomics ProteomicsBioinformatics 2015; 13: 278-289, and Huddleston J et al. GenomeResearch 2014; 24: 688-696).

We found 2 types of exon splicing biomarkers, with favorable andunfavorable prognosis. Table 1 indicates that 15 exon inclusion eventshave unfavorable prognosis (worse outcome, lower survival time), and 5exon inclusion events have favorable prognosis (better outcome,increased survival time). Table 2 indicates that 29 exon exclusionevents have unfavorable prognosis, and 3 exon exclusion events havefavorable prognosis.

TABLE 1 Exon inclusion biomarkers associated with breast cancer survivalSplicing Expression EXON Event ID Gene Prognosis SEQ ID NO: 1446 CCDC115Unfavorable 21 4322 WDR45B Favorable 28 5134 PLEKHA6 Unfavorable 32 5696TTC3 Unfavorable 34 6785 SPATS2 Unfavorable 39 8742 DHRS11 Unfavorable40 13343 ENAH Unfavorable 22 15088 POLI Unfavorable 23 16864 PLXNB1Favorable 24 21181 SH3GLB1 Unfavorable 25 34793 TCF25 Unfavorable 2642420 PRR5-ARHGAP8 Unfavorable 27 44438 VPS29 Unfavorable 29 48175 E4F1Unfavorable 30 49765 TEN1-CDK3 Favorable 31 56552 GNAZ Favorable 3357139 RNF8 Unfavorable 35 57874 ZDHHC13 Unfavorable 36 60615 SH3GLB2Unfavorable 37 62560 ITFG1 Favorable 38

TABLE 2 Exon exclusion biomarkers associated with breast cancer survivalSplicing Expression EXON Event ID Gene Prognosis SEQ ID NO: 1506 CENPKUnfavorable 73 2098 METTL5 Unfavorable 74 2242 PLA2R1 Unfavorable 757106 RHOH Unfavorable 76 7108 RHOH Unfavorable 77 9442 QPRT Unfavorable78 10439 IL17RB Unfavorable 79 11685 STAU1 Unfavorable 80 13451 LYRM1Unfavorable 81 14574 PPARG Favorable 82 16269 BORCS8-MEF2B Unfavorable83 16833 ENOSF1 Unfavorable 84 16929 DHRS4-AS1 Unfavorable 85 16943NDUFV2 Unfavorable 86 18745 FER1L4 Favorable 87 19824 PHF14 Unfavorable88 19828 PHF14 Unfavorable 89 21024 BCL2L13 Unfavorable 90 22227SELENBP1 Favorable 91 24742 LINC00630 Unfavorable 92 27194 CTBP2Unfavorable 93 30244 SLC52A2 Unfavorable 94 33377 SLC38A1 Unfavorable 9540521 FAM65A Unfavorable 96 41168 USP25 Unfavorable 97 45885 HMOX2Unfavorable 98 50148 MKRN2OS Unfavorable 99 52249 ATP8A2P1 Unfavorable100 53188 HIBCH Unfavorable 101 58853 SLC35C2 Unfavorable 102 59314TRIM5 Unfavorable 103 60239 HSD17B6 Unfavorable 104

FIG. 1 shows the detection of the 52 exon inclusion or exon exclusionbiomarkers in The Cancer Genome Atlas (TCGA) patients. Inclusionbiomarkers are depicted in white, and exclusion biomarkers are depictedin black. Biomarkers with favorable prognosis are denoted “1”, whilebiomarkers with unfavorable prognosis are denoted “0”. These biomarkersare detected in 2-33% of patients. For instance, the splicing event42420 affecting the PRRS-ARHGAP8 gene is present in 22% of patients,while the biomarker 15088-POL1 is present is 9% of patients. Also, 91.5%patients have at least one biomarker (754 out of 824 patients).

FIG. 2A shows that 8.5% (70 patients) have no exon inclusion biomarkerspredictors of survival, 13.6% (112 patients) have exactly one exonbiomarker predictor of survival, and 77.9% (642 patients) have more thanone exon inclusion biomarker predictor of survival.

In terms of exon biomarkers detection, breast cancer TCGA patients canbe divided in four groups, (i) unfavorable biomarkers only (60.9% or 502patients), (ii) favorable biomarkers only (2.9% or 24 patients), and(iii) mixed unfavorable and favorable biomarkers (27.7% or 228patients), and (iv) no detected biomarkers (8.5% or 70 patients) (FIG.2B).

Therefore, while it is common to detect more than one biomarker in thepatient, we observed that patients tend to have the same type of exonsplicing biomarker (all unfavorable or all favorable). Additional workis ongoing to devise a strategy to utilize these exon biomarkers in theclinical

Example Application: Use of 52-Exon Splicing Biomarker Panel forPrognosis

We classified patients into different groups based on the outcome(unfavorable, favorable, mixed, no prediction) and number of exonsplicing biomarkers (exactly one event, more than one event). Theclassification is available in the Table 3. For instant, unfavorableprognosis was ascertained to 11.9% of patient (exactly one event).

TABLE 3 Exon Splicing Biomarker Outcome Prediction Number of exon Numberof Percent Outcome splicing biomarkers patients Total Unfavorable  1event 98 11.9% Unfavorable >1 event 40   49% Favorable  1 event 14  1.7%Favorable >1 event 10  1.2% Mixed >1 event 228 27.7% No prediction  0event 70  8.5%

TABLE 4 Genomic Location of Exon Inclusion Biomarkers Splicing Exon ExonExon Gencode event id Gene Chr Strand Target^(¶) Upstream^(¶)Downstream^(¶) RefSeq* v.28* 13343 ENAH chr1 − 225595208- 225567249-225600208- No No 225595329 225567414 225600362 1446 CCDC115 chr2 −130339560- 130338250- 130340908- Yes Yes 130339701 130339232 13034103915088 POLI chr18 + 54272095- 54271360- 54273926- No No 54272242 5427148554274090 16864 PLXNB1 chr3 − 48413458- 48413069- 48413670- No No48413537 48413169 48413818 21181 SH3GLB1 chr1 + 86728403- 86724313-86734602- Yes Yes 86728489 86724405 86734691 34793 TCF25 chr16 +89878461- 89873578- 89883351- No Yes 89878627 89873859 89883512 42420PRR5- chr22 + 44809006- 44808307- 44814672- No No ARHGAP8 4481130444808438 44814758 4322 WDR45B chr17 − 82625587- 82625389- 82627204- NoNo 82625762 82625483 82627291 44438 VPS29 chr12 − 110498820- 110496012-110502049- No No 110499546 110496203 110502108 48175 E4F1 chr16 +2226229- 2223591- 2228372- No No 2226317 2223770 2228523 49765 TEN1-chr17 + 75985173- 75979275- 75986187- No No CDK3 75985288 7597951175986284 5134 PLEKHA6 chr1 − 204271248- 204268208- 204273626- No No204271374 204268312 204273740 56552 GNAZ chr22 + 23122192- 23095706-23123087- No No 23122702 23096418 23125026 5696 TTC3 chr21 + 37075936-37073269- 37108392- No No 37076066 37073364 37108446 57139 RNF8 chr6 +37359183- 37354012- 37360446- No Yes 37359342 37354275 37360574 57874ZDHHC13 chr11 + 19124904- 19117150- 19142978- No No 19125180 1911727619143123 60615 SH3GLB2 chr9 − 129009453- 129009106- 129009771- Yes Yes129009467 129009346 129009871 62560 ITFG1 chr16 − 47450354- 47428804-47451396- No No 47450453 47428898 47451470 6785 SPATS2 chr12 + 49441730-49371228- 49460770- No Yes 49441816 49371290 49461037 8742 DHRS11chr17 + 36593449- 36591903- 36594971- No No 36593616 36592156 36595180^(¶)Human genome build hg38 *Yes: there exists a transcript harboring 3exons (target, upstream and downstream), as well as transcript harboring2 exons (upstream and downstream) reported in the database

TABLE 5 Genomic Location of Exon Exclusion Biomarkers Splicing Exon ExonExon Gencode event id Gene Chr Strand Target^(¶) Upstream^(¶)Downstream^(¶) RefSeq* v.28* 1506 CENPK chr5 − 65528919- 65528452-65529117- No Yes 65529017 65528578 65529199 2098 METTL5 chr2 −169815477- 169811764- 169819561- No No 169815528 169812506 1698196432242 PLA2R1 chr2 − 159955698- 159955199- 159956510- No No 159955828159955346 159956627 7106 RHOH chr4 + 40197101- 40193489- 40242714- YesYes 40197300 40193812 40242834 7108 RHOH chr4 + 40197121- 40193545-40242714- No No 40197300 40193812 40242834 9442 QPRT chr16 + 29695172-29694664- 29696996- No No 29695199 29695096 29697127 10439 IL17RB chr3 +53855294- 53852871- 53856844- No No 53855341 53852997 53856986 11685STAU1 chr20 − 49174195- 49153933- 49188116- Yes Yes 49174269 4915407149188357 13451 LYRM1 chr16 + 20915556- 20902486- 20920122- Yes Yes20915714 20902717 20920214 14574 PPARG chr3 + 12416704- 12405882-12433898- No Yes 12417154 12406081 12434577 16269 BORCS8- chr19 −19180686- 19150682- 19182573- No Yes MEF2B 19180761 19150764 1918268316833 ENOSF1 chr18 − 691204- 690549- 693882- No No 691276 690631 69390816929 DHRS4- chr14 − 23953774- 23940393- 23954748- No No AS1 2395403323941158 23955082 16943 NDUFV2 chr18 + 9115528- 9103092- 9117838- No No9115902 9103433 9117903 18745 FER1L4 chr20 − 35560163- 35559341-35560540- No No 35560364 35559627 35560638 19824 PHF14 chr7 + 11061791-11051612- 11061964- No No 11061852 11051780 11063404 19828 PHF14 chr7 +11061791- 11051612- 11061964- No No 11061851 11051780 11062085 21024BCL2L13 chr22 + 17696141- 17683214- 17726677- No No 17696210 1768332117729133 22227 SELENBP1 chr1 − 151369004- 151368199- 151369713- No No151369189 151368319 151369769 24742 LINC00630 chrX + 102816992-102770352- 102825993- No No 102817082 102770420 102826169 27194 CTBP2chr10 − 125133512- 125038997- 125162581- No No 125133612 125039155125162780 30244 SLC52A2 chr8 + 144357251- 144354661- 144359184- No No144357602 144354690 144359423 33377 SLC38A1 chr12 − 46196725- 46194651-46197720- No No 46196871 46196276 46197817 40521 FAM65A chr16 +67544956- 67544695- 67545376- No No 67545117 67544830 67545534 41168USP25 chr21 + 15777904- 15766002- 15791502- No No 15778027 1576614115791664 45885 HMOX2 chr16 + 4483637- 4474771- 4505484- No No 44837544474847 4505610 50148 MKRN2OS chr3 − 12543180- 12541860- 12545247- No No12543229 12542022 12545524 52249 ATP8A2P1 chr10 + 37248118- 37242758-37261864- No No 37248396 37242847 37261925 53188 HIBCH chr2 − 190208880-190204635- 190212956- Yes Yes 190208913 190205232 190213075 58853SLC35C2 chr20 − 46355802- 46355073- 46356574- No No 46355865 4635524146356637 59314 TRIM5 chr11 − 5709135- 5679761- 5937401- No No 57092555680238 5937505 60239 HSD17B6 chr12 + 56763198- 56752180- 56773834- NoYes 56763414 56752318 56774165 ^(¶)Human genome build hg38 *Yes: thereexists a transcript harboring 3 exons (target, upstream and downstream),as well as transcript harboring 2 exons (upstream and downstream)reported in the database

Example 2

In this example, we analyzed the splicing events listed in Table 4 andTable 5 (see FIGS. 3A-54D). The expression (expressed as PSI) of thesetarget exons varies substantially across cancer and normal samples (see,e.g., FIG. 3A, varying from 0 (0% inclusion) to 0.3 (30% inclusion)).

Visual inspection of data suggests the existence of a subpopulation ofsamples in which the target exon is included, or “spliced-in”. Thissubpopulation (classification “4” samples in FIG. 3A) was formallydetected using a clustering methodology called GMM. The GMM analysis ofsplicing event 1446 (CCDC115) generated 4 subpopulations of samples(clusters).

Nonetheless, only one of the clusters (e.g., C4 of FIGS. 3A and 3B)qualifies as a tumor specific cluster, because it has the followingproperties:

-   -   cluster C4 contains more than >90% of tumor samples (see FIG.        3B);    -   cluster C4 has >10% increase expression (PSI) compared to normal        (PSI_(tumor)−PSI_(normal)>10%), see FIG. 3C; and    -   the exon inclusion event is very low or absent expression in        normal tissues (PSI_(normal)<5%), see FIG. 3C.

The cluster C4 contains 97 breast cancer patients out of 824 analyzed,which means that the exon inclusion event was detected in ˜12% of TCGAbreast cancer patients. Moreover, survival analysis of breast cancerpatients in cluster C4 versus the remaining breast cancer patients inTCGA indicates that patients in C4 (expressing the targeting exon) havea worse overall survival (FIG. 3D). Therefore, the exon inclusion event1446 (CCDC115) is (i) specific to breast cancer, (ii) is detected in asubpopulation of breast cancer patients, and (iii) is associated tounfavorable overall survival.

Furthermore, the expression (expressed as PSI) of a different targetexon varies substantially across cancer and normal samples (see, e.g.,FIG. 23A, varying from 0 (0% exclusion) to 1.0 (100% inclusion)).

Visual inspection of data suggests the existence of a subpopulation ofsamples in which the target exon is excluded, or “spliced-out”. Thissubpopulation (classification “4” samples in FIG. 23A) was formallydetected using a clustering methodology called GMM. The GMM analysis ofsplicing event 1506 (CENPK) generated 4 subpopulations of samples(clusters).

Nonetheless, only two of the clusters (e.g., C1 and C3 of FIGS. 23A and23B) qualifies as a tumor specific cluster, because it has the followingproperties:

-   -   clusters C1 and C3 contains more than >90% of tumor samples (see        FIG. 23B);    -   cluster C1 has >10% increase expression (PSI) compared to normal        (PSI_(tumor)−PSI_(normal)>10%), see FIG. 23C; and    -   the exon exclusion event is very low or absent expression in        normal tissues (PSI_(normal<)5%), see FIG. 23C.

The cluster C1 contains 37 breast cancer patients out of 824 analyzed,which means that the exon exclusion event was detected in ˜4% of TCGAbreast cancer patients. Moreover, survival analysis of breast cancerpatients in cluster C1 versus the remaining breast cancer patients inTCGA indicates that patients in C1 (the targeting exon is spliced out)have a worse overall survival (FIG. 23D). Therefore, the exon exclusionevent 1506 (CENPK) is (i) specific to breast cancer, (ii) is detected ina subpopulation of breast cancer patients, and (iii) is associated tounfavorable overall survival.

TABLE 6 Exon Inclusion Event Sequences Splicing Event ID Gene NamecDNA Sequence SEQ ID NO: 1446 CCDC115 GCCTGCAGCTGGCCGCAGACATAGCCAGCCTCC1 AGAACCGCATTGACTGGGGTCGAAGCCAGCTCC The underlinedGGGGACTCCAAGAGAAACTCAAGCAGCTGGAGC exon inclusionCTGGGGCTGCCTGACATGCGCGCAAAGAGGCAG sequence isGGCAGCGAGCACAGCTGTTCTCCGACATGGCTA SEQ ID NO:CGTGATCTCAGGCCTTCTTCCTTCACAATTAGCT 21.CTTGCCCCTACCCCACGCCAGCTAATGCCCCTTC TGTGTCCCTGCTCTGCATGTTTCCATTTTCCTTAGGTGTGAAGTTTGAAGAGGCAAACAGTAATTTTG AAAGCCACTACTTTGAAACCATTCTAAGGCCTGAGTTCCCATAGGACACACTCACATAGGCAGGTA CACGTTAGTCAACAATTGGAACTGCCTCTTGGATCACTCAGCTGTGCTTTCATGGCTGGATGATGGAA CACTGTGCGAAGAGAGATGGGGGCCAGGAAGTAGCGCTTCATGCTTAGTACATCCTCCAAATTGTCT TTGCTGGAGGAGAAAACCGTACTCAGCCAAAAGATCAGGACAATATGACTTGAGTCCACAAGGACA CAAACACCTGAGTAGCTGGGCAGCCCTTGGCAGGGTCTAAGCCAGGAAGTAAAAATGATCTGGCCT AGATATTTAAGGGAACTCTAGGAAGAGGCCTAGGTTTTTAAAATCCTGTCTCTTTGTCTTACCATAAG AGGCTGAGCCTCTCTTCATTTTTTTGAAGGGCCACTTGTGTTTTCTGTTCTGGGAACTTCATTCATTTT TCTACTGGGTTGTTGATCTTTGCAGTAATTTCTAGGAGCTGTTTATGTTTGGAGGTAATTGGTCCTTT GTCCATATATATGAGATGTAAGTCTTATTTTCCAGTTTATCTTTTTGCTTATTTTTTTTGACTTTTTATT GTAAAATAAAACATCAAACTGCACAGAACAGTTGAATAGCTTAATGAATAACTACAGTAAAAGCTA TGGTAACC CCCTGCTGCTGAACAGGAGGCCGAAGACGAGAGCTGCCCGGAGGACTGGGCAGCA GCTGTTCCAGCAGAGACATCAGCAAAAGCCATCTAGAGGTGGATCCAGAGTGTGGACTAACA GAGAAAAGAAGTGGAGGGAGAGCAG GTCTGCGGAGGCGCAAGGGCCCCACTAAGACCCCAGAAC CGGAGTCCTCTGAGGCCCCTCAGGACCCCCTGAACTGGTTTGGAATCCTAGTTCCTCACAGTCTACG TCAGGCTCAAGCAAGCTTCCGGGATG 13343 ENAHTGAACAGAGTATCTGTCAGGCAAGAGCTGCTGT 2 GATGGTTTATGATGATGCCAATAAGAAGTGGGTThe underlined GCCAGCTGGTGGCTCAACTGGATTCAGCAGAGT exon inclusionTCATATCTATCACCATACAGGCAACAACACATTC sequence isAGAGTGGTGGGCAGGAAGATTCAGGACCATCAG SEQ ID NO:ACAGAGTCTCGCTCTGTTGCCCAGGCTAGAG 22. TGCAATGGCGTAATCTCAGCTCACTGCAACCTCCGCCTCCCGTGTTCAAGCGATTCTCCTGCCT CAGCCTCCTGAGTAGCTGGGATCACAG ACAGAGTCTGACTGTTGCCCAGGCTGGAGTGCAATGG CACCAACATGGCTCACTGCAACCTTGACCTCCTGGGCTCAAGTGATCCTCCCGGCCTCCGTCTCCCGA ATAGCGGTCTTACTCATTTTCTACGTGTGTGTTGAGTGCACCATTTGAGA 15088 POLI GAGTTCATGATCAAGTGTTGCCCACACCAAATG 3CTTCATCCAGAGTCATAGTACATGTGGATCTGGA The underlinedTTGCTTTTATGCACAAGTAGAAATGATCTCAAAT exon inclusionCCAGAGCTAAAAGACAAACCTTTAG GAAAGATT sequence isCCTCTTTTAGTGTAAGCATAAAGAACATTTTT SEQ ID NO:GGTTCACTTGCTGCTACCCTCTTGTGCCCACT 23. TTGGCTTAATAAATCCCAATCCAGCCTAGCTGATTTACTGAAGAACAAAGGGATGACTAGTTTT TGCTACGCCAAG GGGTTCAACAGAAATATTTGGTGGTTACCTGCAACTATGAAGCTAGGAAACTTG GAGTTAAGAAACTTATGAATGTCAGAGATGCAAAAGAAAAGTGTCCACAGTTGGTATTAGTTAATG GAGAAGACCTGACCCGCTACAGAGAAATGTCTTATAAGGTTACAG 16864 PLXNB1 GAGGAAGAGCAAGCAGGCCCTGAGGGACTATA 4AGAAGGTTCAGATCCAGCTGGAGAATCTGGAGA The underlinedGCAGTGTGCGGGACCGCTGCAAGAAGGAATTCA exon inclusion CAGGCCAAGTGGTCTCTGTTCAACAACTCAGC sequence isTTTGCCACTGTGGCACAAAGGCAGCCAGGGA SEQ ID NO: CGACATGGAAACACATGAAAGTGCAGATGGGG 24. AACTTGCGCTTCTCCCTGGGTCACGTGCAGTATGACGGCGAGAGCCCTGGGGCTTTTCCTGTGGCAG CCCAGGTGGGCTTGGGGGTGGGCACCTCTCTTCTGGCTCTGGGTGTCATCATCATTGTCCTCATGTAC AG 21181 SH3GLB1AAAGAAAGGAAACTATTGCAAAATAAGAGACTG 5 GATTTGGATGCTGCAAAAACGAGACTAAAAAAGThe underlined GCAAAAGCTGCAGAAACTAGAAATTCA CAACTA exon inclusionAACTCAGCTCGCCTTGAAGGAGATAACATTAT sequence isGGTAAATTTCTCTTACATGCTCAACTTCCTGC SEQ ID NO: ATGTAAAATGGCTGAAGTCTGAACAGGAATTA 25. AGAATAACTCAAAGTGAATTTGATCGTCAAGCAGAGATTACCAGACTTCTGCTAGAGGGAATCAGC AGTACACAT 34793 TCF25ACCCCGCGCGAAGAGTGCGCAGGCGCGCCGACA 6 GCCGAGTTTTCTGCGCTTCCTTCTCCCTCTCTCCAThe underlined GACGTCGTGGTCGTTCGGTCCTATGTCGCGCCGG exon inclusionGCCCTCCGGAGGCTGAGGGGGGAACAGCGCGGC sequence isCAGGAGCCCCTCGGGCCCGGCGCCTTGCATTTCG SEQ ID NO:ATCTCCGTGATGACGATGACGCGGAAGAAGAAG 26. GGCCCAAGCGGGAGCTTGGTGTCCGGCGTCCCGGGGGCGCAGGGAAGGAGGGCGTCCGAGTCAAC AACCGCTTCGAGCTG GAAAAATGGACATTTTCCTCTCCCCCTAAAAAAAGATAAAACTCCTTCCT GGTTATTAACTGAAATGCTGATCGAGCTTTATCCTAAAGAAGATCAGTCGTGGACAAGAACCT TGTGAAATGTTCCCCATTTGAGACCCTAAAACTAATGAAAATCACAGCTTTTGG ATAAACATTG ACGATCTTGAGGATGACCCTGTGGTGAACGGGGAGAGGTCTGGCTGTGCGCTCACAGACGCTGTGG CACCAGGGAACAAAGGAAGGGGTCAGCGTGGAAACACAGAGAGCAAGACGGATGGAGATGACAC CGAGACAGTGCCCTCAGAGCAG 42420 PRR5-GTATTTGAAGTACACACTGGACCAATACGTTGA 7 ARHGAP8GAACGATTATACCATCGTCTATTTCCACTACGGG The underlinedCTGAACAGCCGGAACAAGCCTTCCCTGGGCTGG exon inclusionCTCCAGAGCGCATACAAGGAGTTCGATAGGAA A sequence isGACGGGGATCTCACTATGTGGCCCAGGCTGG SEQ ID NO:TCTCGAACTCCAAGCTCAAGCGATCCTCCCAC 27. CTCAGCCTCCCAAAGTACTGGGATTACAGGCAGGAGCCACCATGCCAAGCCAACACTCTTGTT CTTAAAGGGCCAGACAGTCAGCATTTTAGCTTTGCAGGCCTGTTGCTCTATTGCAACAACTCTG CTGGACTGTGTTCCAGTAAAACATTATGGACGCTGAAATGTGAATTTCATGTCATTTTCACGTG TCATGAAATATTCTTCTGTTTTTTTTTTTCAACCACTTAAAAACATAAAAAGCCATTTTTAGCTT GCAGCCTGTACCAAAGCAGGAAGCAGGCTAGGTTCATCCTGCCTGCCCATTCTCCCACCCCTG GTCCAGTGAATTACTGGCAAAGAAACAACTGCATGACCGTTTCTTCACTAAAGCCTCTTCTTG CTTTCACAGCCCTTTACAGTCTGCAAGGGGCATTCTGATGCCTCTTGTTGGTGAGATGGCAGCC TCATTTTACAGATGAGGACATAGGCCCCAGGGAGCAAGTGACTTACCCGTGGTCACTCAGCTT GTGTGTGGTAGGGCAGGATCCCACCCCAGGCCCCCGCCTCCCTCTCCCACCCAACGCTACTCA CCGCTTGGCCATGGCCTGGAGCCGGCAGACTTTTCCTGAGGGACGTCCGGCCTAATAATCAAC TTGGCAATATATCTGGCTCGTAGACTGCGGCGATGGGCGTTGATGTGGATATCCTAGATTCCT CTGGGTTTTCCTTCTTCAAAGTCCTTTCAAACCTGTAACAGAAATCTGCTTCACAGATATCTGA GTCAGTGGGACAGTGGAAGGCAGTGCCTGAATGTCCCAGAAGTCCTCCCTCCAGTTGCCTTTT GGGTCCTGCTGTCATTATCAATAGGACCTTCGGAGGGACTTCTTGGTTCCCCATCCTATGTCTT AGGGAAAGAATTGTTGCTGTATTTTGCAGTCATTTACTGGGCACCTGTATAAGCTGGAGATGG CCTAGCCCCAGCGCATGTCCTCCTCCAGGAAGGCTTCCTGGGTTGTCCTGGGAGAATCAATA GCCCCTTCCCTGCAGCCTCACTGTGCCTAAGCAGACACCAATCCTAGCTAGCACTTAGGGGTTT GTGAACAGGTCTGCCTCCTGCACTAGGCTGTGATCCCGGACCTGTCTCTGCATCCCTTGCAGG TGGGAAAGGATCTGCATATGGCAGCCTTTTTTTTTTTTTTTTTTTTTTTGAGACAGAGTCTCATT CTATTGCCTGGGCTGGAGCACAGTGGCGAGATCTCGGCTCACCACAACCTCCACCTCCCAGGT TCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTACCTGGGACTACAGGCGTGAGCCACCATGCC CGGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACTATGTTGGCCAGGCTGGTCTTGAAC TCCTGACCTCGTGATCCGCCTGCCTTGGCCTCCCAAAGTGCCGGGATTACAGGCGTGAGCCAC TGTGCCCAGCCGGCAGGCTTTTATTAAGCGTTAGATGGGAGGATAGAGGAGTGAAGTGGTACT GGCAGGAAGTACCAAGGTTCCAGCTGGCGTAATCAGGAAGGCTGCATGGAGGAAGCAGCCTT TGAGCTGCCTGTGGAGTGGTGGGCAGGGTGTTGTGAAGTGGCAATCACTGGATTTTGCTTCTG GTACGAGGTGTGGCCAGATGCAAGAAAGAGCAGGGTGGACTTTGGTGCAATTGGTGGGGGTC TGGTCTGTAGGGTTCCCGTGGGGAGCCGTGGAGGGAGGCAGCAAAGGAGGGAGGGGCACAG AGGATGCTGGACTGTGTTTAAGAGGCAGCAGGGAGCCATGGCAGGTGCTTGAGGAGAAGCGA GTGATGTGTTTAAAGCAGCCCTTTCAGGAGGCTCAGGCTCACAGCAGGATGTGCACAGTAGC CCTGTCTTGAGCTAAAGCAGATGAAGGTTTTGCCCTCTGCACTTCCCCACGTGAGAAACGAAG ATGCACCCGCAGATTCCTTGAGGCAGCTCCCCCACTTCTCAGTTGCCAGAAATCAGCCCAGAG AAACAAACCCGTAATCAGCCCAGGGTGCTTTCCCTTCCCTTTCTCGAGGGGGCTGCTGGTTCGC ACATAAGGAGTGGGTCACTCCCGCTTGGGAGAAAGCAGCAGAATTCCTTCACAGCCAGGTAA GATGTGCCAGTGGTCGATGGATGAAATCTAGCCGGGGAGTTGGAATCTGTGTTGCCAGCAGT GACCTGTGAGCAGTGACAAAGCCAAAG GTACAAGAAGAACTTGAAGGCCCTCTACGTGGTGCAC CCCACCAGCTTCATCAAGGTCCTGTGGAACATCTTGAAGCCCCTCATCAG 4322 WDR45B AATTGTGGTGGTTTTGGACTCCATGATTAAGGTG 8TTCACATTCACACACAATCCCCATCAGTTGCACG The underlinedTCTTCGAAACCTGCTATAACCCCAAAG ATGGAG exon inclusionTGTTTGATGATGTCTCTCTGAACCTCAGAGAC sequence isGTCTCTTAGGCTGACCTTCACCCAGGCGAGA SEQ ID NO:AGCACTCCCTCAGCAGAGCCAGCCCACGTGC 28. ACTCGCCGAGCTCCAGGCCTGGCGCTGGCTACCTGCCTCCAGAGCTTTTTCTTCAGGAACACT CCTTTTCTGTGTG TAATGATCTGGGATGACCTGAAGAAGAAGACTGTTATTGAAATAGAATTTTCT ACAGAAGTCAAGGCAGTCAAGCTGCGGCGAGAT AG44438 VPS29 TTGGTGTTGGTATTAGGAGATCTGCACATCCCAC 9ACCGGTGCAACAGTTTGCCAGCTAAATTCAAAA The underlinedAACTCCTGGTGCCAGGAAAAATTCAGCACATTC exon inclusionTCTGCACAGGAAACCTTTGCACCAAAGAGAGTT sequence isATGACTATCTCAAGACTCTGGCTGGTGATGTTCA SEQ ID NO: TATTGTGAGAGGAGACTTCGATGAGGCTGGGCA 29. CAGAGTAAGTTTCTTCACTTAGCTCCTACTAACAGTGGTGGTTGGGTGGCTGTTTACTGACTG GATTTCTTACCCTTTTAAGGTCTGTTGAAAGGAAGTAACCGAATTCCCATGCTTTGATTGGGTT GGCTCTTTATTTTAATTTAATAAGACTGCCATTTCCAGGATCTTTTGCTTTCTTAAAGGACTCT ATCATCTATGTCTATCCCGATTTGTCAAAGTGTGGAATTTGGGCGGGAACATGTTTCAAAGTAT GACACGTGTTATGTAACACTATTTCCCCATAACTTTGTCATCAGCAGGAAACCAGAGGATTCTG TCCTAGTAAGGATCCCTACTAATTTGAAATGATTGTGTGGTCATTCATACAGTTATATCTTTAG ACTGCTAATAGTCTTGAGTCTTGGAGATAATCCACAGTACTTTATAGAATTAGGTCATCAATCA TTATAAAGTACCATGTCTTACTAATGTTCTTTCTGGTACATTCAGATTGAACAGCTCATTCATT ATTAGTACCAAACATTTCAACCTGTTGTAGACATATACCCTTTTATGAGTTTGGGGTGGTGGTT GTTGTTGTTGTTCTTCTTCTTCTTTTAAATATAGAAATCTATTATTTTTACCTTTTTCTCAAAGCA AGATTCCCATACTAACTATGTACTTCAATCCATATCAGAAGGAATCCCCCTCTAAAATGAAGAT TGTTCTATATCCAG GAGCCTGAGGAAGAGGGCGGCGACGGTGGTGGTGACTGAGCGGAGCCCGGT GACAGGATG 48175 E4F1ATCTTCCTGCGGCGCGTTGCGACATGGAGGGCG 10 CGATGGCAGTGCGGGTGACGGCCGCTCATACGGThe underlined CAGAAGCCCAGGCCGAAGCCGGGCGGGAAGCG exon inclusionGGCGAGGGTGCAGTTGCGGCGGTGGCGGCGGCC sequence isTTGGCCCCCAGCGGCTTCCTCGGCCTCCCGGCGC SEQ ID NO: CCTTCAGCGAGGAAGCTTGGAGAAGGGCAGTG 30. CCCTCATGGCGAGGAGTCCCTTTAGAGGTTGCTGGGCCTGCTTGTGGCCTTGTCTGGTGTGA AATGGGCTGG ATGAGGACGATGTGCACAGATGCGGCCGCTGCCAGGCAGAGTTCACCGCCTTGGA GGATTTTGTTCAGCACAAGATTCAGAAGGCCTGCCAGCGGGCCCCTCCGGAGGCCCTGCCTGCCAC CCCTGCCACCACAGCGTTGCTGGGCCAGGAG 49765TEN1- GGGGCGATGTCCGCGTCGTGGCTGGGGCCGGTC 11 CDK3GCGGGGCAGACTAATCCCCTGCTCCTGGCCAGG The underlinedGGAGGCTCCCGAGCGGATCCTCGGGAAAGGGGC exon inclusionTCCGAAGGTCAAGAAACTGCCCTGCTGGGCGTC sequence isCGGGGAGTGGGAAAATAAAGCACTTTTTGTATC SEQ ID NO:CCGCCCCTCCCCCGTCACGTGACCACGCGAGGC 31. GGAAAGAAGAAATCCGAGGACCGGCGACGCCTAGAACAG GGTCTTACTCTATTGCCGAGGCTAC AGTATAGTGGTGTGATCATAGCTCACTGCAGCTTCAACCTCCTGTGGTGGTGATCCTCCTGCCT CAGCCTCCTAAGTTGCTGGGACTACAG GAGCCCATGATGCTGCCCAAACCTGGGACCTATTACCTC CCCTGGGAGGTTAGTGCAGGCCAAGTTCCTGATGGGAGCACGCTGAGAACATTTGGCAG 5134 PLEKHA6GCAACTCGCACAGCCCGCAAAGCCGTCGCCTTT 12 GGCAAGCGCTCACACTCCATGAAGCGGAACCCCThe underlined AATGCACCTGTCACCAAGGCGGGCTGGCTCTTC exon inclusion AAACAGTTGCTGAGTGCTTGTTATGGCTGGAT sequence is ACCTTGCTGGCTCTGGTGATAAAGAGATGAASEQ ID NO: AAAGACAAAAGTTCCTCCCTGCAAAGAGCTCA 32.TGGTGCAATGGAAGAGATAGAAAGCTGCATT GTGACAG ATCGACCTTGGACATGTCCAATAAAACAGGTGGGAAACGCCCGGCTACCACCAACAGTG ACATACCCAACCACAACATGGTGTCCGAGGTCCCTCCAGAGCGGCCCAGCGTCCGG 56552 GNAZ GGCAAAGCTCAGAGGAAAAAGAAGCAGCCCGG 13CGGTCCCGGAGAATTGACCGCCACCTGCGCTCA The underlinedGAGAGCCAGCGGCAACGCCGCGAAATCAAGCTG exon inclusionCTCCTGCTGGGCACCAGCAACTCAGGCAAGAGC sequence isACCATCGTCAAACAGATGAAGATCATCCACAGC SEQ ID NO:GGCGGCTTCAACCTGGAGGCCTGCAAGGAGTAC 33. AAGCCCCTCATCATCTACAATGCCATCGACTCGCTGACCCGCATCATCCGGGCCCTGGCCGCCCTCAG GATCGACTTCCACAACCCCGACCGCGCCTACGACGCTGTGCAGCTCTTTGCGCTGACGGGCCCCGCT GAGAGCAAGGGCGAGATCACACCCGAGCTGCTGGGTGTCATGCGACGGCTCTGGGCCGACCCAGGG GCACAGGCCTGCTTCAGCCGCTCCAGCGAGTACCACCTGGAGGACAACGCGGCCTACTACCTGAAC GACCTGGAGCGCATCGCCGCAGCTGACTATATCCCCACTGTCGAGGACATCCTGCGCTCCCGGGAC ATGACCACGGGCATTGTGGAGAACAAGTTCACCTTCAAGGAGCTCACCTTCAAGATGGTGGACGTG GGGGGGCAGAGGTCAGAGCGCAAAAAGTGGATCCACTGCTTCGAGGGCGTCACAGCCATCATCTTC TGTGTGGAGCTCAGCGGCTACGACCTGAAACTCTACGAGGATAACCAGACA GGAAGTGGTGAACT GGGGAGTCAGACAAGAGCATCATGCTTCTTAAAAGCCCAGACCCCTGGCTATAACACATCGA AGATTCTCAGAAGAGAATTGAGGAGCGGACAGGCGCCACACTCCGTTGTGGTCACTGCCTCTT CCTGGCCCACCACACTCCTGTCCTCTGCATGTACTGAGAGCTCTGTCCAGGATGCCAGGGTCC TGCCTCGGCAGAGAGGCGGTGCCAGATGCCCCACAGCAGCTGGTGGGAGTGCCCACAGCTGG AGGGCAGGGGAGGAGCCTGGCCTCTGGCTGGTGTTTCCTTCCCAGCTCTCAAGAACTGGAGAC TTTGGTTACAGAAGTGAAGGCTGCTCCCTCACAGACTTCCTAGTGTCCGATGGTACCACATGGA AGGATCAGAGTTTTGAAGGACTGGGCCAGAACCCAGATAGGGCACAAGGCTGCCAGCGCCTG CATTGAGGGAGCTATGATGTGACGGGGGCTCCTGCAGAAGATGGCCTTCCTTGTACAG AGTCG GATGGCAGAGAGCTTGCGCCTCTTTGACTCCATCTGCAACAACAACTGGTTCATCAACACCTCACTCA TCCTCTTCCTGAACAAGAAGGACCTGCTGGCAGAGAAGATCCGCCGCATCCCGCTCACCATCTGCTT TCCCGAGTACAAGGGCCAGAACACGTACGAGGAGGCCGCTGTCTACATCCAGCGGCAGTTTGAAGA CCTGAACCGCAACAAGGAGACCAAGGAGATCTACTCCCACTTCACCTGCGCCACCGACACCAGTAAC ATCCAGTTTGTCTTCGACGCGGTGACAGACGTCATCATACAGAACAATCTCAAGTACATTGGCCTTTG CTGAGGAGCTGGGCCCGGGGCCCGCCTGCCTATGGTGAAACCCACGGGGTGTCATGCCCCAACGCG TGCTAGAGAGGCCCAATCCAGGGGCAGAAAACAGGGGGCCTAAAGAATGTCCCCCACCCCTTGGCC TCTGCCTCCTTGGCCCCACATTTCTGCAAACATAAATATTTACGGATAGATTGCTAGGTAGATAGAC ACACACACATGCACACACACACATCTGGAGATGGCAAAATCCTCTAAAATGTCGAGGTCTCTTGAA GACTTGAGAAGCTGTCACAAGGTCACTACAAGCCCAACCTGCCCCTTCACTTTGCCTTCCTGAGTTG GCCCCACTCCACTTGGGGGTCTGCATTGGATTGTTAGGGATAGGCAGCAGGGCTGAGGCAAGGTAG GCCAACTGCACCCCTGTCGCCTGGAGGAGGGCCAGCTCGCTGCCCGAGCTCTGGCCTAGGGACCTTG CCGCTGACCAAGAGGGAGGACCAGTGCAGGGTCTGTGCACCTTCCCTGCTGGCCTGCACACAGCTGC TCAGCACCACTTTCATTCTGGACCTGGGACCTTAGGAGCCGGGTGACAGCACTAACCAGACCTCCAG CCACTCACAGCTCTTTTTAAAAAACAGCTTCAAAATATGCAGCAAAAACCAATACAACAAAACGAGT GGCACGATTTATTICAAACTAGGCCAGCTGGGATTCCAGCTTTTCTTCTACTAGTCTGATGTTTTATA AATCAAAACCTGGTTTTCCTTCTCTGACATTTTTTTTTTGTTTTGTTTTTTGGTTTTTTTTTTTTTTTGGC CAAATCTCGTGGTGTTTCGCAGAAAAAAATCCAGAAAATTTCAAATGCAGTTGAGTATTCTTTTTTA AATGCAGATTTTCAAAACATATTTTTTTTCAGGTGGTCTTTTTTGTGTCTGGCTTGCTGAGTGTAAAA GTTGTTATCTGGACGATCTGTCTCTCTGCTCCAAAGAAATTTTGGAGTGAGTGGCAGTCCTGCGCCA GCCTCGCGGGACACGTGTTGTACATAAGCCTCTGCAGTGTCCTCTTGTTAATGGTGGGGTTTTCTGCT TTGTTTTTATTTAAGAAAATAAACACGACATATTTAAAGAAGGTTCTTTCACCTGGGAGCAAATGAA CAATAGCTAAGTGTCTTGGTATTTAAAGAGTAAATTATTTGTGGCTTTGCTGAGTGAAGGAAGGGG AGCAAGGGGTGGTGCCCCTGGTCCCAGCATGCCCCGCGCCTGAGACTGGCTGGAAATGCTCTGACT CCTGTGAAGGCACAGCCAGCGTTGTGGCCTGAGGGAGGCCCTGCTGGGACCCTGATCTGGGCCTTCC TGTCCCAGGGCCTATGGGCAACTGCGTTGAAAGGACGTTCGCCAAGGGCCGTGTGTAAATACGAAC TGCGCCATGGAGAGGAGAGGCACTGCCGGAGCCCTTGCCAGATCTCCCTCCCTCTCTCCGTGCAGTA GCTGTGTGTCCGAGGTCAGTGTGCGGAATCACAGCCAAGGACGTGAAGAGATGTACGGGGGAAAG AGAAGCTGGGGATTGGATGAAAGTCAAAGGTTGTCTACTTTAAGAAAATAAAATACCCTG 5696 TTC3 CCGTCGGCTGACGTGGAGGGCCGGAGGTGGCGG14 CGGCGGCGGCGGCGGCTGCTGCTGCTGCTGCCC The underlinedGCGTCCGAGGCTCGCGGGCGGCGGGCCCGG TAT exon inclusionTTGATAAATTCAAAATATATGTAAAACATATG sequence isCAAGCTGTATAGCAGAACAATAAAATGAACAC SEQ ID NO:CTATGAATTCACCACTCAATCCAATAATCAAA 34. ATGACCAGTATTGAATGTGCTTACTTCCAGAGAAATGCACTCGGTGATGGAAAGAGAGCCACTAT TCTGAAGAACACTTGGCCAAAG 57139 RNF8GGCGAGCGGAGCCTGCTTTCGCAGCGATCGCGA 15 GCGTGTGGCGATTGCTTCTGTCTGTTATTTAGATThe underlined ATGGAAGCTGAGGGGATGCACAGAGGCAGCCA exon inclusionGAACCTAGGTCAGGGTCTCGCTCGGTGCTGACC sequence isGCCCCCGGGGTCGAGTAGGCGATGGGGGAGCCC SEQ ID NO:GGCTTCTTCGTCACAGGAGACCGCGCCGGTGGC 35. CGGAGCTGGTGCCTGCGGCGGGTGGGGATGAGCGCCGGGTGGCTGCTGCTGGAAGATGGGTGCGAG GGTTGTTATGAACTAGACTGGTCCAACAGGAAAGTATGATAGATGTGAACTGGGGCTTCTTTT CAACCTTTTCCGGAAGCTCTCAAGCTGTTCTTGTGGATAAGACAGAGAATATGTACTCCAATG CAAAGACTTTTGGTTGAATTATAACTGGCTGA AGGTGACTGTAGGACGAGGATTTGGTGTCACAT ACCAACTGGTATCAAAAATCTGCCCCCTGATGATTTCTCGAAACCACTGTGTTTTGAAGCAGAATCCT GAGGGCCAATGGACAATTATGGACAACAAG 57874ZDHHC13 CCAGCAGGAAGTGGGAGAAGAGGCGACCCAAG 16GCGGGCTGGCGGGCTGGCGGCAGTCGCTACTTG The underlinedCCTAGTAGCCTCAGCCGCTGTGGGCTCCTGGGG exon inclusionAGATGGAGGGGCCGGGGCTGGGCTCGCAG CCTT sequence isGACTTGAGCCCTGGAAATAAGCATCAGTGCA SEQ ID NO:GACGAGTGCTCTATGAGAAGCTATCTAGTTAA 36. AGCTCAAGGAGCCACAAAGGGATTTCCTGGCAGCACAGTCACCAGAAACACTGAGGGAGAAC TCTCTGAACAGAGGAATTGTGACCCCAAGACAGTAGTTTTTAGACGTGACACCAAAAGCACAA TCCATAAAAGAACAAATTGATAAATTGGACTTTTTTAAAATTTAAAACTTCTGCTCTATGAAAC AGACTTTTAAGAGATGGGAAG TGCAGGAATCACAGCCATGGCCCCCACCCTCCAGGATTTGGTCGA TATGGCATCTGTGCACATGAAAACAAAGAACTTGCCAATGCAAGAGAAGCTCTTCCTCTTATAGAG GACTCTAGTAACTGTGACATTGTCAAAGCTACTC A60615 SH3GLB2 ATTTCCCGGCACCTTCGTGGGCACCACAGAGCCC 17GCCTCCCCACCCCTGAGCAGCACCTCACCCACCA The underlinedCTGCTGCGGCCACTATGCCTGTGGTGCCCTCTGT exon inclusionGGCCAGCCTGGCCCCTCCGGGGGAGGCCTCGCT sequence isCTGCCTGGAAGAGGTGGCCCCCCCTGCCAGTGG SEQ ID NO:GACCCGCAAAGCTCGGGTGCTCTATGACTACGA 37. GGCAGCCGACAGCAGTGAGCTGGCCCTGCTGGCTGATGAG CTCCCAGGGTGCCAT GTGAACCACC TGCGCTGCCTCCACGAGTTCGTCAAGTCTCAGACAACCTACTACGCACAGTGCTACCGCCACATGCT GGACTTGCAGAAGCAGCTGGGCAG 62560 ITFG1GAATTTATCATGGCATCCAGCATTGACCACTACA 18 AGTAAAATGCGAATTCCACATTCTCATGCATTTAThe underlined TTGATCTGACTGAAGATTTTACAGCAG CCATAC exon inclusionCACCCTGAACGCGCCCCATCTCTTCTGATCTC sequence isGGAAGCTAACCAAGGTCAGACCTGGTTAGTG SEQ ID NO:CTTGGATGGGAGATCACCTATTACTTTTTCT T 38. TTCAATGGTGATCTAATTCCTGATATTTTTGGTATCACAAATGAATCCAACCAGCCACAGATACTAT TAGGAGG 6785 SPATS2CTGCTGGCTACCAATATTCTACTTTCTGTCTCTAT 19 GAATGTGACTACCCTGGTTACCTCATATTTATTT The underlined GCAGTGACTTAAAATTTGGAGGCAAATTTTCC exon inclusionTTAAGAGGATATCAAGTTCCAGTATCTTCAGA sequence is TGTTGATAAGCCGTTAGAATCTCCCTGGAAAA SEQ ID NO: GGAGACATGAATGTCTGCAATGATACTTCCTGA 39.CAAGAAGTTGATACAAGAAAAGGAAAGGAGAT TAACAGCTAGTGAGCAGAATTTCGAACAGCAGGATTTCGTATTTTTTGCTTCCAACTGCACACTTCCG TTGCCCACTTTTAAATCAGAGATACCTACACTCAAAACCCAGACAAGGCAAAAGGATACTTTTCTTG TATATTTTTTGAGATCGAAGAAACGACAATGTCCAGGAAACAGAACCAGAAGG 8742 DHRS11 GATCGGACCCAAGCAGGTCGGCGGCGGCGGCAG 20GAGAGCGGCCGGGCGTCAGCTCCTCGACCCCCG The underlinedTGTCGGGCTAGTCCAGCGAGGCGGACGGGCGGC exon inclusionGTGGGCCCATGGCCAGGCCCGGCATGGAGCGGT sequence isGGCGCGACCGGCTGGCGCTGGTGACGGGGGCCT SEQ ID NO:CGGGGGGCATCGGCGCGGCCGTGGCCCGGGCCC 40. TGGTCCAGCAGGGACTGAAGGTGGTGGGCTGCGCCCGCACTGTGGGCAACATCGAG GAATTTTGAG TCTAGAGGAGGAAGCGGGAAGATGTACACCAGGGGAGGGGAAAGCTGCAGTCTTCCTTGCCC ACAGTCTGCTTTGATTGATTCAGTCATTGATGTTAAAGCAGAATTTGGGTTCTAGCTTCCTACA GAGAAAACTCCTGTTTCCTGAAGTGATCAAATGAGCTGGCTGCTGAATGTAAGAGTGCAGGCTAC CCCGGGACTTTGATCCCCTACAGATGTGACCTATCAAATGAAGAGGACATCCTCTCCATGTTCTCAGC TATCCGTTCTCAGCACAGCGGTGTAGACATCTGCATCAACAATGCTGGCTTGGCCCGGCCTGACACC CTGCTCTCAGGCAGCACCAGTGGTTGGAAGGACATGTTCAAT

TABLE 7 Exon Exclusion Event Sequences Splicing Event Id Gene NamecDNA Sequence SEQ ID NO: 1506 CENPK AATCTTTAATGAACTGAAAACTAAAATGCTTAA 41TATAAAAGAATATAAGGAGAAACTCTTGAGTAC The underlinedCTTGGGCGAGTTTCTAGAAGACCATTTTCCTCTG exon exclusionCCTGATAGAAGTGTTAAAAAGAAAAA GGGAAC sequence isAACGGTGGTTGGATGAACAGCAACAGATAAT SEQ ID NO:GGAATCTCTTAATGTACTACACAGTGAATTG 73. AAAAATAAGGTTGAAACATTTTCTGAATCAAThe sequence G TTCCAAAAGCTGAGACAAGATCTTGAAATGGT without theACTGTCCACTAAGGAGTCAAAGAATGAAAAGTT underlined AAAGGAAGACTTAGAAAGexon exclusion sequence is SEQ ID NO: 105. 2098 METTL5AACTTCGATATGACCTGCCAGCATCATACAAGTT 42 TCACAAAAAGAAATCAGTAAGTCTCTTGATTTTGThe underlined GCTGGTCTACATTCGGTATTGAAAAGCTTTCTGG exon exclusionGCCGGATGTGGTGGTTCATGCCTGTAATCCCAGC sequence isTACTCGGGAGGCTGAGGCAAGAGAATCGCTTGA SEQ ID NO:ACTCAGGAGGCAGAGGTTGCAGTGAGCTGAGAT 74. TGCCCCACTGAACTCCAGCCTGCGCGATAAGAGThe sequence TGAGACTCAGTCTCGAAAAAGAAAAAAAAAGGA without theAAGCTTTGTGACAAGTAATTATTTCTAGTGTTAC underlinedCAACTTTCCTGTGTAAATATACAAAGCCAGCCTA exon exclusionGGAGACACCATAAATGGCCTGTGGGAAAGGCCC sequence isATCGTCAATAGCTAATATTCTAGTTCTTTCCTAA SEQ ID NO:ATGCTTTGGGTACAAAAAGAAAAAAAAAATCAA 106.AAACTGTTTTTGCTCTTTTCATATAGTATATATTT TATTAGTTAGTTTGTACTAATACATTCTCATATTACAAAGGCAATTTAATGGAAGAATCTTCCTTTTGA TATTTGAATCATCTGAAATAACACAAACAGAACAATACATTCAAAGAAATCTCATTTGCATAACAA AAAGACAAGTTAAACAACAAAAAAATTTTTCCTTTCTCACAGGTGGACATTGAAGTGGACCTAATTC GGTTTTCCTTTTAAAAGCCCCGCAAACAAAAGTCGTTTAAAACCTATTTAAAATGAATAAAAAATTGG TT CATGTTCAAAAGAAAGCTGCAGAATGGAAAATCAAGATAGATATTATAGCAG GGACAGATAT GGCTTTTCTAAAGACTGCTTTGGAAATGGCAAGAACAGCAGTATATTCCTTACACAAATCCTCAACTA GAGAA 2242 PLA2R1ATTCCAAGTCACAATACCACTGAAGTTCAGAAA 43 CACATTCCTCTCTGTGCCTTACTCTCAAGTAATCThe underlined CTAATTTTCATTTCACTGGAAAATGGTATTTTGA exon exclusionAGACTGTGGAAAGGAAGGCTATGGGTTTGTTTGT sequence is GAAAAAATGCAAGCTTTCATTACTATGAATCTT SEQ ID NO: TTTGGCCAGACCACCAGTGTGTGGATAGGTTT 75.ACAAAATGATGATTATGAAACATGGCTAAATG The sequenceGAAAGCCTGTGGTATATTCTAACTGGTCTCCA without the TTTGATATAATAAATTGCCTTCTGCTGAATATCC underlined CCAAAGACCCAAGCAGTTGGAAGAACTGGACGCexon exclusion ATGCTCAACATTTCTGTGCTGAAGAAGGGGGGA sequence isCCCTGGTCGCCATTGAAAGTGAGGTGGAGCAAG SEQ ID NO: 107. 7106 RHOHAGAGAGAGAGAGAGAGAGAGGAGAGGAGGGGC 44 GGGGTGGGGGAGGAGGGGAGTGGGGAGAGAGAThe underlined AAGAGAGAAACACCAAAAAGACATTTTCAAGGA exon exclusionAGGAAGAAAATTAGATGGCAACCCCCTGTCCCC sequence isTCCCCCTAAGAAAATCCTCTCTGAGATTAAACTG SEQ ID NO:TGTGAAGATTAGAGGCGTGTAGGTCAGGAGCAG 76. GAGGAAGCCCAACGCTGGACTGTACCAGATCATThe sequence CTAAAACTGGCAATTCCAGGCACAGAAAACCAG without theTTCTTCAGAAGCAGAAGGGTGGTCAGCCAGGGG underlinedGTGAAAGGGACAGGGGTCTCGCAGCCAG CCCAA exon exclusionCTGTTGTATTTTCAGTTCTTCCAGTGTGAATC sequence isAGTTAATATTCTCGGGAACGAGGGAGAGGTT SEQ ID NO:GATCCTATGAGGAAATCAACCACAGTGAAAA 108. GGCTTGGGCCGCTTTTGTTTTCACCTGCTTTTGTTGAACAAATTTGATTTCCGGAGTCAGTCAT TTTACTGTCAAGACATTTCTTCGGCATTCTGC AACAGTTTCCAACATGGCTAGATCCATCAGAAA CTGAAGCCGTGGAGAACGCTCTCGGGGCCTTTGCCACTTCTTGGAGTAGAAGCCGACAGAGAGCTG TTTGGAAACTTCTCCTTCACACACCAG 7108 RHOHGAGAGAGAAAGAGAGAAACACCAAAAAGACAT 45 TTTCAAGGAAGGAAGAAAATTAGATGGCAACCCThe underlined CCTGTCCCCTCCCCCTAAGAAAATCCTCTCTGAG exon exclusionATTAAACTGTGTGAAGATTAGAGGCGTGTAGGT sequence isCAGGAGCAGGAGGAAGCCCAACGCTGGACTGTA SEQ ID NO:CCAGATCATCTAAAACTGGCAATTCCAGGCACA 77. GAAAACCAGTTCTTCAGAAGCAGAAGGGTGGTCThe sequence AGCCAGGGGGTGAAAGGGACAGGGGTCTCGCAG without the CCAGTTCTTCCAGTGTGAATCAGTTAATATTC underlined TCGGGAACGAGGGAGAGGTTGATCCTATGAGexon exclusion GAAATCAACCACAGTGAAAAGGCTTGGGCCG sequence isCTTTTGTTTTCACCTGCTTTTGTTGAACAAATT SEQ ID NO:TGATTTCCGGAGTCAGTCATTTTACTGTCAAG 109. ACATTTCTTCGGCATTCTGCAACAG TTTCCAACATGGCTAGATCCATCAGAAACTGAAGCCGTGGA GAACGCTCTCGGGGCCTTTGCCACTTCTTGGAGTAGAAGCCGACAGAGAGCTGTTTGGAAACTTCTC CTTCACACACCAG 9442 QPRTGCCTGGCGCTGCTGCTGCCGCCCGTCACCCTGGC 46 AGCCCTGGTGGACAGCTGGCTCCGAGAGGACTGThe underlined CCCAGGGCTCAACTACGCAGCCTTGGTCAGCGG exon exclusionGGCAGGCCCCTCGCAGGCGGCGCTGTGGGCCAA sequence isATCCCCTGGGGTACTGGCAGGGCAGCCTTTCTTC SEQ ID NO:GATGCCATATTTACCCAACTCAACTGCCAAGTCT 78.CCTGGTTCCTCCCCGAGGGATCGAAGCTGGTGCC The sequenceGGTGGCCAGAGTGGCCGAGGTCCGGGGCCCTGC without theCCACTGCCTGCTGCTGGGGGAACGGGTGGCCCT underlinedCAACACGCTGGCCCGCTGCAGTGGCATTGCCAG exon exclusionTGCTGCCGCCGCTGCAGTGGAGGCCGCCAGGGG sequence isGGCCGGCTGGACTGGGCACGTGGCAGGCACGAG SEQ ID NO:GAAGACCACGCCAGGCTTCCGGCTGGTGGAGAA 110. TGTGGTGGCCGCCGGTGGCGTGGAGAAG GCGGTGCGGGCGGCCAGACAGGCGGCTGACTTCACT CTGAAGGTGGAAGTGGAATGCAGCAGCCTGCAGGAGGCCGTGCAGGCAGCTGAGGCTGGTGCCGAC CTTGTCCTGCTGGACAACTTCAAGCCAGAG 10439IL17RB TGGACATTTTCCTACATCGGCTTCCCTGTAGAGC 47TGAACACAGTCTATTTCATTGGGGCCCATAATAT The underlinedTCCTAATGCAAATATGAATGAAGATGGCCCTTCC exon exclusionATGTCTGTGAATTTCACCTCACCAG GCTGCCTA sequence isGACCACATAATGAAATATAAAAAAAAGTGTGT SEQ ID NO: CAAGGCCGGAAGCCTGTGGGATCCGAACATCAC 79. TGCTTGTAAGAAGAATGAGGAGACAGTAGAAGTThe sequence GAACTTCACAACCACTCCCCTGGGAAACAGATA without theCATGGCTCTTATCCAACACAGCACTATCATCGGG underlined TTTTCTCAGGTGTTTGAGexon exclusion sequence is SEQ ID NO: 111. 11685 STAU1AAAGCATAACCCCTACTGTAGAACTAAATGCAC 48 TGTGCATGAAACTTGGAAAAAAACCAATGTATAThe underlined AGCCTGTTGACCCTTACTCTCGGATGCAGTCCAC exon exclusionCTATAACTACAACATGAGAGGAGGTGCTTATCC sequence is CCCGAGAGTTTATTAACCACTTAACCTCTCAG SEQ ID NO: AACTGAACAAAGACAACATTGTTCCTGGAACG80. CCCTCTTTTTAAAAAAG GGGCTGCGGGCGCCTG The sequenceAGCGGCTCTTCAGCGTTTGCGCCGGCGGCTGCCG without theCGTCTCTCTCGGCTCCCGCTTCCTTTGACCGCCTC underlinedCCCCCCCCGGCCCGGCGGCGCCCGCCTCCTCCAC exon exclusionGGCCACTCCGCCTCTTCCCTCCCTTCGTCCCTTCT sequence isTCCTCTCCCTTTTTTCCTTCTTCCTTCCCCTCCTCG SEQ ID NO:CCGCCACCGCCCAGGACCGCCGGCCGGGGGACG 112. AGCTCGGAGCAGCAGCCAG 13451 LYRM1AGAGTACCCAGAGAAGGAGAAGCCAGCAAAGG 49 AGACGACACAGACAAGACCTCAGAGATCAAAGGThe underlined AAGAGGCCCCTTAATATCCTGGAATAATGGGAC exon exclusionCCATCCCCGTAATCAGTGAATCTCATCCACCCGC sequence isTTGCCAGCTTCTACCCGCAGCAAGTAGAAGCTA SEQ ID NO:AGTCCTGGCTCAAATCTCTTCCCTCCCTCCCTCTC 81. CCAGCTGTCAGTGCTTTTGGACTTGTGCTCAGAT The sequence GACAACGGCAACACGACAAGAAGTCCTTGGC without theCTCTACCGCAGCATTTTCAGGCTTGCGAGGAA underlinedATGGCAGGCGACATCAGGGCAGATGGAAGAC exon exclusionACCATCAAAGAAAAACAGTACATACTAAATGA sequence isAGCCAGAACGCTGTTCCGGAAAAACAAAAAT C SEQ ID NO:TCACGGACACAGACCTAATTAAACAGTGTATAG 113. ATGAATGCACAGCCAGGATTGAAATTGGACTGCATTACAAGATTCCTTACCCAAGGCCA 14574 PPARG CCATCAGGTTTGGGCGGATGCCACAGGCCGAGA50 AGGAGAAGCTGTTGGCGGAGATCTCCAGTGATA The underlinedTCGACCAGCTGAATCCAGAGTCCGCTGACCTCCG exon exclusionGGCCCTGGCAAAACATTTGTATGACTCATACATA sequence isAAGTCCTTCCCGCTGACCAAAGCAAAGGCGAGG SEQ ID NO:GCGATCTTGACAGGAAAGACAACAGACAAATCA 82. CCATTCGTTATCTATGACATGAATTCCTTAATThe sequence GATGGGAGAAGATAAAATCAAGTTCAAACAC without theATCACCCCCCTGCAGGAGCAGAGCAAAGAGG underlinedTGGCCATCCGCATCTTTCAGGGCTGCCAGTTT exon exclusionCGCTCCGTGGAGGCTGTGCAGGAGATCACAG sequence isAGTATGCCAAAAGCATTCCTGGTTTTGTAAAT SEQ ID NO:CTTGACTTGAACGACCAAGTAACTCTCCTCAA 114. ATATGGAGTCCACGAGATCATTTACACAATGCTGGCCTCCTTGATGAATAAAGATGGGGTTCTC ATATCCGAGGGCCAAGGCTTCATGACAAGGGAGTTTCTAAAGAGCCTGCGAAAGCCTTTTGGT GACTTTATGGAGCCCAAGTTTGAGTTTGCTGTGAAGTTCAATGCACTGGAATTAGATGACAGC GACTTGGCAATATTTATTGCTGTCATTATTCTCAGTGGAG ACCGCCCAGGTTTGCTGAATGTGAA GCCCATTGAAGACATTCAAGACAACCTGCTACAAGCCCTGGAGCTCCAGCTGAAGCTGAACCACCC TGAGTCCTCACAGCTGTTTGCCAAGCTGCTCCAGAAAATGACAGACCTCAGACAGATTGTCACGGAA CACGTGCAGCTACTGCAGGTGATCAAGAAGACGGAGACAGACATGAGTCTTCACCCGCTCCTGCAG GAGATCTACAAGGACTTGTACTAGCAGAGAGTCCTGAGCCACTGCCAACATTTCCCTTCTTCCAGTT GCACTATTCTGAGGGAAAATCTGACACCTAAGAAATTTACTGTGAAAAAGCATTTTAAAAAGAAAA GGTTTTAGAATATGATCTATTTTATGCATATTGTTTATAAAGACACATTTACAATTTACTTTTAATATT AAAAATTACCATATTATGAAATTGCTGATAGTATTTGAAGACTGAGTCTTGTGTGTTTCCCACCCTAG CCCCCAGGCTTTCTTTTTTACCCCTTTTCCTTCTCCCCTCCCTCCCTCCATCCCTCTCACTCTTCCTCCC TCCCTTCCCTCCTTTCCTTCTTCCTTTATTTTTCTTTTCTTTCTTAGACATTTTAAAATATGTGAGTGGA ACTGCTGATACACTTTCATTCTCAGTAAATTAATTTTTTACTCAAT 16269 BORCS8- ACAAAGATCATTCCACTCAGCCTGGGACGATGG 51 MEF2BGGAGGAAAAAAATCCAGATCTCCCGCATCCTGG The underlined ACCAAAGGAATCGGCAGCCCGGAGGAACCACC exon exclusion CCCGCCCTCCTCAGCCTGATCCTGGAAGAGAsequence is CTCGGGGCCCCCCAGCCTCCGCCAACCCAG C SEQ ID NO:GCCGTGAAGAACCTGGTGGACAGCAGCGTCTAC 83. TTCCGCAGCGTGGAGGGTCTGCTCAAACAGGCCThe sequence ATCAGCATCCGGGACCATATGAATGCCAGTGCC without the CAGGGCCACAGunderlined exon exclusion sequence is SEQ ID NO: 115. 16833 ENOSF1AGAAGCAAATGCTGGCACAAGGATACCCTGCTT 52 ACACGACATCGTGCGCCTGGCTGGGGTACTCAGThe underlined ATGACACGTTGAAGCAG GATCCCAGGATGCTG exon exclusionGTATCCTGCATAGATTTCAGGTACATCACTGA sequence is TGTCCTGACTGAGGAGGATGCCCTAGCCTGTC SEQ ID NO: TGGAAGTTACTTGTGGACATG 84. The sequence without theunderlined exon exclusion sequence is SEQ ID NO: 116. 16929 DHRS4-GTGCCACTTCGGATAAACCCTTTGGACTCCTAAC 53 AS1TCCAATCAGGTGTCTGCTTTGTTGAGGACTCACA The underlinedGACACAGTCTCCTTTCTTCAAGATCTTTACAATG exon exclusionCAAGACCTCACTAACACACAGGGATGGTCTCCC sequence isAGAGGGTCTGTGCTGTTCCTTCACTCAGAACATC SEQ ID NO:AAGATGCACTGAAGTAAGGATCCTCTATTCTACA 85.GTTCCTGCTAGCTGAGCTATTCCATGGGGGCTTC The sequenceAGCAGGAAATTCCAAGGTTGGCTTTGACAAGCT without theAAGGCCGGCTGGTGGAGCACATCGAGTTCTGGA underlinedGGTTCATGTGTGTTTTCATGAAGATCTGTCTGCC exon exclusionCGTAGCAGATAAAGAGTTGTTGCCCCACTCCTCC sequence isTGGGGTCTTCTATTTTCCTGGGGGAATTTCTGG SEQ ID NO:ATTAACTGAACACACACACACACACACACACCC 117.TTTTGAAGCATCAACAGTAATTCTGAGTTCTTAG GGACAATGCAGATTAAATCCACAATAAGAAAGACAACTATGGCCAGGTGTGGTGGCTCACGCCTGTA ATCCCAGAACTTTGGGAGGCTGAGGCGGATGGATCACCTGAGGTCAGGAGTTAGAGACCAACCTGA CCAACATGGAGAAACCCCGTTTCTACTAAAAATGCAAAATTAGCCGGGCATGGTGGCAGGCGCCTG TAATCCCAAATACTCGGGAGGCTGAGGCAGGAGAATCACTTAAACCCGGGAGGCAGAGGTTGCAGT GAGCCAAGATCGCGCCATTGCACTCCAGC GGCCAGACTTTGGCAGCGTGTAAGGTCTGAGGACA GGGGCACCGGAGGCCGAGGATGAGAGGCCAGTGCCTGTTTCCAGGCAGCCAGGGCCTCAGA AACTCCGGCCGGAGCACTCACCCGTCGGTGGAGGCCGTTACCAGGGCCACCTTATTTGCGAG CGGGTCCCGGCGGGTCATCCCGGAGCTGGCCATCCGCACCGAATTCCAAGCCCGGGCACAGA GGCCTAGCAGCCCCGCCTTGTGCATGGATCAGACCAGCAA ACATGGGCCCCGTCCTGGGCCAAA CGCCGGGCGATGGCGAAGCCGATCCTGTGAGCAGAAAGAGACAAAGACTGCTAAGGCCTGTGCAGG GGAAGAGGTCGACAGTATGAGCTCTGAAGTTAAGACTGCCCGGGTTTGAATTCTGGCTCTTTCTCTA TATAACCCCTACGTGTGCCTACTATGTGTAAAACAGGCTTAATGGCATGGCCATTTTTGGCATTCCTT TACTTGTTTTTATTATGACCTGGACCACAGCCTCAGTTCCCAAGAACTGACATCACTTTCTACAGTTC CCACCATGGGTGACAGGCTTCATCCCCTCTTGGGACTGAGAG 16943 NDUFV2 CGCAGAATCTAGGCCTGCTCTGGCCAGATCAGTT 54TCGAAGACCGTCGCTCCGAAGGAGGCACCTCTC The underlinedGTTTCAAGCCTAGTGACCTCGATGCTTTTAGGTT exon exclusionGCAGCATACTGGAGAGCTCTGGCTTGCTTCGTGA sequence isAGGCTTAGGGAGAACTTCATTAGGGCTGGAAAA SEQ ID NO:GGGTGGCCAATGTTTGATTTACTGCAGTTGTGCT 86. TTGCATATCGGAAATGCTGGCTAAATAAACGGTThe sequence ATCAAACTAACTCTGAAAGAACGGCGCCGCAAA without theTAACAGCACCCAATTAAAGAACCACAGGATTTT underlinedAGAGATTAAATGATCTTTTTGAGATCCAAGTACA exon exclusion TCTCATGGAAAAATACCTAGGTTAGAATTACT sequence is AAATTAAAAAATGGACACTTGGGGCCAGGCGSEQ ID NO: CAGTGGCTTACGCCTGTAATTCCACCACTTTG 118.GGGAGCTGAGGCGGGCAGATCACTTGACATC GAGAGTTCAAGACCAGCCTGACCAACATGGA  GAAACCCCGTCTCTACTAAAAATACAAAAAAT TATCCAGACGTAGTGGCACATGCCTGTAATCTCAGCTACTTGGGAGGCTGAGGTAGGAGAATC GCTTGAACCCGGGAGGCAGAGGTTGTGGTGAGCCGAGATCATGCCATTGAACTCCAGCCTGG GCAACAAGAGCGAAACTCCGTCTCCAAAAAAAAAAAAAGACACTTATTTAGGCTTTCCATATA TCATG GGAAGACATGTAAGGAATTTGCATAAGACAGTTATGCAAAATGGAGCTGGAGGAGCTTTATT TGTG 18745 FER1L4GATCCCTGGAGTTGCAGCTACCAGACATGGTGC 55 GTGGGGCCCGGGGCCCCGAGCTCTGCTCTGTGCThe underlined AGCTGGCCCGCAATGGGGCCGGGCCGAGGTGCA exon exclusionATCTGTTTCGCTGCTGCCGCCGCCTGAGGGGCTG sequence isGTGGCCGGTAGTGAAGCTGAAGGAGGCAGAGGA SEQ ID NO:CGTGGAGCGGGAGGCGCAGGAGGCTCAGGCTGG 87. CAAGAAGAAGCGAAAGCAGAGGAGGAGGAAGGThe sequence GCCGGCCAGAAGACCTGGAGTTCACAGACATGG without theGTGGCAATGTGTACATCCTCACG CTGGGTGAAG underlinedGGGTTGGAGCATGACAAGCAGGAGACAGACG exon exclusionTTCACTTCAACTCCCTGACTGGGGAGGGGAA sequence isCTTCAATTGGCGCTTTGTGTTCCGCTTTGACT SEQ ID NO:ACCTGCCCACGGAGCGGGAGGTGAGCGTCCG 119. GCGCAGGTCTGGACCCTTTGCCCTGGAGGAGGCGGAGTTCCGGCAGCCTGCAGTGCTGGTCC TGCAG CTATGAGCTCAGAGTTGTCATCTGGAACACGGAGGATGTGGTTCTGGACGACGAGAATCCA CTCACCGGAGAGATGTCGAGTGACATCTATGTGAAGAG 19824 PHF14 GCAGTGCTCGGAATGTGACCAGGCAGGGAGCAG 56TGACATGGAAGCAGATATGGCCATGGAAACCCT The underlinedACCAGATGGAACCAAACGATCAAGGAGGCAGAT exon exclusionTAAGGAACCAGTGAAATTTGTTCCACAGGATGT sequence isGCCACCAGAACCCAAGAAGATTCCGATAAGAAA SEQ ID NO: CACGAGAACCAGAGGACGAAAACGAAGCTTC 88. GTTCCTGAGGAAGAAAAACATGAGGTTGGAAThe sequence TAAG GAAAGAGTTCCTAGAGAGAGAAGACAAA without theGACAGTCTGTGTTGCAAAAGAAGCCCAAGGCTG underlinedAAGATTTAAGAACTGAATGTGCAACTTGCAAGG exon exclusionGAACTGGAGACAATGAAAATCTTGTCAGGTAAG sequence isTTGGATGCTAAAACCTTGTCTTTAGGGGATGAAA SEQ ID NO:GTTCTATATTTATTTTCTCATCACAGAAAAAATG 120.AAAAAACAATTGCAGGATAAGACCTTTCTTAAA ATATTATATAGTGGAAACAGTACTTTAGAAACAGATTTCATCCACTTCTTAACCTCTCACACATGGT TATACTCTGGATTTAAATGTAAATAAGAGTGATAATCTGCCTGTTTAACACAGGGAATTATTTTTCTCT TGACAAGAGAAATTGACAGTGCTCTCTATTTAGAGGCCATGAAAGTAATTTGATCTAAACACTGTGTA CTAAGATTATTATGTTTTATGTCAGAAAACAATAAAGTTACTAAGCTCTGTTAGCATATTCTAAATGT TTGAAATTTAGAAGCAATGGTGAGAAGACAGACTTTTTATTGACAAGAACTTAATTAGCACTTTCTTA TTGCTTATCAAAACAAATGTGTTAAATGCTTCTCCCTTACGAAATAAAGAAAGGTGAAAAGATGGCC TAGGTTGATTTTATTTTTTGTTTTGTCTTTGTTTCTTTGTTTCGTTTTGGTACTTTATTTTTTTTTAATCA GACATAATGCTAATCAGAAATCTTAGCTGATGCTGCACATTGGCTTTTCCCAACGGTCCAGAGGCTGC TAATTTTAGCGGAAATGAAGACATTGATCAAAGCTCTGGTGAGATGGGGGAGTGAGTGTGTGAACA AAAAGAGAGCTAATTTAAAAGAGGCATCAGACTTTCAAAGGACAGTGTCACAAAAGTTCTTACAGTT CTTACAGGGACTTTGTAAGGGAATCCATTCTTATTTCTTTAAAAAATTGTCTTCTGGTAAAGCCCTGT TAAATTAACTGAGGACACAGAAATTAAACATTTCAAAAAGAATAAACATATTGATAAAACAAATAT ATTAGTGTTGTTGTATGTTTTTAAATACTTACTTCCAAATGATTTAATCTATTTTGGTCATTAAAATAT GTCTTAATTTCTCAAAGAAAGGCATGAAGTCTTAAATTTTATGAGTTTTTTATGCTATCAATGAGAAA GATAAAGTAAAAATTACAGTAGAAAAAGACAAAGTCCTTCAACAAAGTTAAGAAAGTTTATAATAAT TGGCTAATTTTTTTGAGGTAGTTCATGTAGAGTGTGTTGGGAGCTATCCTGAAGGTTAAGTTTATTAA AATTTAGGGTAAAGTAGTAAGTAGTTCCAAGTTCAGGAGATACACCTGAATAATTCTGACCACAGTA TAAATTTTGCAATATGTCGAAAATGAAATCCCAAGCATAAGCGTAACATAATGGAGTAAAT 19828 PHF14GCAGTGCTCGGAATGTGACCAGGCAGGGAGCAG 57 TGACATGGAAGCAGATATGGCCATGGAAACCCTThe underlined ACCAGATGGAACCAAACGATCAAGGAGGCAGAT exon exclusionTAAGGAACCAGTGAAATTTGTTCCACAGGATGT sequence isGCCACCAGAACCCAAGAAGATTCCGATAAGAAA SEQ ID NO: CACGAGAACCAGAGGACGAAAACGAAGCTTC 89. GTTCCTGAGGAAGAAAAACATGAGGTTGGAAThe sequence TAA GAAAGAGTTCCTAGAGAGAGAAGACAAAG without theACAGTCTGTGTTGCAAAAGAAGCCCAAGGCTGA underlinedAGATTTAAGAACTGAATGTGCAACTTGCAAGGG exon exclusionAACTGGAGACAATGAAAATCTTGTCAG sequence is SEQ ID NO: 121. 21024 BCL2L13GGGTTCAACTAGATATAGCTTCACAATCTCTGGA 58 TCAAGAAATTTTATTAAAAGTTAAAACTGAAATTThe underlined GAAGAAGAGCTAAAATCTCTGGACAAAGAAATT exon exclusion TCTGAAGGCCAGTGACATATCAGGCATTTCGG sequence is GAATGTACACTGGAGACCACAGTTCATGCCASEQ ID NO: GCGGCTGGAATAAG GGCACTGTGTTTAGTCTTG 90.AGTCAGAGGAGGAGGAATACCCTGGAATCACTG The sequenceCAGAAGATAGCAATGACATTTACATCCTGCCCA without theGCGACAACTCTGGACAAGTCAGTCCCCCAGAGT underlinedCTCCAACTGTGACCACTTCCTGGCAGTCTGAGAG exon exclusionCTTACCTGTGTCACTGTCAGCTAGCCAGAGTTGG sequence isCACACAGAAAGCCTGCCAGTGTCACTAGGCCCT SEQ ID NO:GAGTCCTGGCAGCAGATTGCAATGGATCCTGAA 122. GAAGTGAAAAGCTTAGACAGCAACGGAGCTGGAGAGAAGAGTGAGAACAACTCCTCTAATTCTGAC ATTGTGCACGTGGAGAAAGAAGAGGTGCCCGAGGGCATGGAAGAGGCTGCTGTGGCTTCTGTGGTCT TGCCAGCGCGGGAGCTGCAAGAGGCACTTCCTGAAGCCCCAGCTCCCTTGCTTCCACATATCACTGC CACCTCCCTGCTGGGGACAAGGGAACCTGACACAGAAGTGATCACAGTTGAGAAATCCAGCCCTGC TACATCTCTGTTTGTAGAACTTGATGAAGAAGAGGTGAAAGCAGCAACAACTGAACCTACTGAAGTG GAGGAGGTGGTCCCCGCACTGGAACCCACAGAAACGCTGCTGAGTGAGAAGGAGATAAACGCAAGG GAAGAGAGCCTTGTGGAAGAGCTGTCCCCTGCCAGCGAGAAGAAGCCCGTGCCGCCGTCTGAGGGC AAGTCTAGACTGTCCCCCGCCGGTGAGATGAAGCCCATGCCGCTGTCTGAGGGCAAGTCTATACTGC TGTTTGGAGGGGCTGCTGCTGTTGCCATCCTGGCAGTGGCCATCGGGGTAGCCCTGGCTCTGAGAAA GAAATAGGAGGCTTTTCAGAAGAGAAAGACAGAAGGATGTAAGGTTGGAGTTGTATTGGCTGGAATT TGAACCTCCAGCAGCTGTCTGGACATTTGTGGAACACTCTGGGATAATTGGGGACTTCTGCTCAACAT GGCAGTGGCATGTTAGGCATGTTAGGGCTTGAGGTGGGGCATTCACATTCATCTGACTGTAAATCCC AAGGGCCTCCGCTCATGCTAAATTGAGAATCTTAGGGGTAAAGCACCCCCTCCAGGACCGGGTTTCT CAGCCTTGGCACTAGTGCTGTTCTGACCATTCTCTGTGTTGGGGCTGTCCTGTGTGTGGTGGGCTCCA CCCACTAGATGCCAGTGGCACCCCCTCCCAGAGATGACAAACGAAAATGTCTCTAGACATTGCCAA ATGTCCCGTGTGAACATCCCCTATTGAGACCCACTGCTTTAGCGAGAGAGGGTTTACTTAGGAAGAA TTGGGATAGAAATTCCCAGCTGAGAGAACTTAGCTGTGGGCTCCTCAGCTACTGACTTCTTAGCTCT TAATCCCCTTAGAATTTCATCTTTCTCGATGAGCAGGCTCTGCACCCACTCTTTTTTTGCCCCCCGCC CTCATCCTGGAGTGTGAGGGTGCTCGCCCGTACTCTCAGCTGCCTCTCAGGGACTGCACTGTTCCTCT TCACCCCCAGGTTCCTGCTAAGATCCCACGGGCGAGGGCTTGCTCTGGACTCAGTCTGTCAAGTCCCC GAAGCTTCCTGCAGCTCCACCTTGTAAAAATGCTGCCTTTGGGAATCTTCGAAATATGTACACAGAG AAAATCACATGAAGGAGACCTGGGGTCCCCACTTGTGAGTGCAACTGCAAGTAACTCTGGCTAGAG AGACACATGTGTCTTGTGTCAAGGCAGGAGGATAACCTGGATGACCTTCTGAGGTCTCTTCAGCCCT TTTCGCTAGTGGTCACCCACCACCATGGTTACTTGCCAGCAACATCTCTATTGCTGGATGGTCCCTGT CTATAACCTTGGGCTAGTATATTTTTTCCAATATGGGACCTTAGTCTTACTACTGATGAGTTCTATGG GTCTCTTGCTAGGGGGTAAGGATTTTTATTCTTGGGCTTATAGAGCCAGTTAGATCATAATTCTTATG AAATAGAGAGTGTCCTAAATATCACTGAAATAAAAAGTAGGAAAAAGAAGCTTGAATTTTAAGACT GAGGCTGCTCTGCAGATTCTAGTTTGGCTTTCAGAGTTCAAGAGTGGTGGCATCTTCACCTGAATTCT TCAATGCCAGGGTAATAAACCAAAATAGTCCTAATCAGTATATGCTAGTTGAGCATCGGCATAATTT TCTTTCCTCTGGCTGATCCCAGCCCTAAAGGAAGGGTAGACCCGTGTCTTTCCAGCCCTAAAGGAAG GGTAGACCCGTGTCTTTCCAGCCCTAAAGGAAGGGCAGACCCGTGTCTTTCCATGCCCGAGGGCCAC GACGTCACTATGCAGGGCACACGTGGCTTGGTTTAAAAAGGTCATCTTAGATTTATCTTAGTAAATGT AATAAATTATTTTTTAGATCTTGAAATTTATAATAAAAATACTTTACCTACCCTGATC 22227 SELENBP1GTCATTGAGCCCAAGGACATCCATGCCAAGTGC 59 GAACTGGCCTTTCTCCACACCAGCCACTGCCTGGThe underlined CCAGCGGGGAAGTGATGATCAGCTCCCTGGGAG exon exclusionACGTCAAGGGCAATGGCAAAG GTCATCCACCG sequence isGCTGCCCATGCCCAACCTGAAGGACGAGCTG SEQ ID NO:CATCACTCAGGATGGAACACCTGCAGCAGCT 91. GCTTCGGTGATAGCACCAAGTCGCGCACCAAThe sequence GCTGGTGCTGCCCAGTCTCATCTCCTCTCGCA without theTCTATGTGGTGGACGTGGGCTCTGAGCCCCG underlined GGCCCCAAAGCTGCACAAGCTACGAAATGTGG exon exclusion GAATTGTGGACCCGGCTACTCCACCCCTCTGGAGsequence is GCCATGAAAG SEQ ID NO: 123. 24742 LINC00630GTTGATTCCATACCCTGGCTATTGTGAATAATGC 60 TGCAGTGAACATGGGAGTACATACATCTGTTTGAThe underlined G GAACTCAGAGTGGTTTTCCAGATGGGAATCA exon exclusionCATTGCTCTCTGTCCCTGAGATCTTGCTGGAG sequence isACAGGGCTACTCAGTCCCTCTTTGCCAGGTAA SEQ ID NO:TCTGTTCCAGAAGAAACATGTGTCGTTCTGACTG 92.AGCCCCTGCCTGTCTGTCACCTTAAGAGCCAGTC The sequenceAATTCATATGGTCCCCATATCAAAGTCTCCTGTG without theCCCAGAGAGAGGATTTCATTTCAACCATCACCAT underlinedCACCACCATCATCATCATCACCAAGAGATGTTGT exon exclusion TGA sequence isSEQ ID NO: 124. 27194 CTBP2 GGTTCATAGTGGCGTCATGCACGCAGACTCCTGC 61AAGTTCCCCTAAGTTCTTAGAGGACTGCTTTGCC The underlinedTTTTGATCTGAGAGTTGCAAAGTTCCATAAAGAA exon exclusionTGGCCCTTGTGGATAAGCACAAAGTCAAGAGAC sequence is AGCGATTGGACAGAATTTGTGAAGATGGAGAAA SEQ ID NO: ACAAAGGATTCAGATTGAAGGACTGCTCAGA 93.CACCCTCCGAAGAGGTGGCCCTGCCTGCGCT The sequenceCCTCCTGGCTGCAGAGTACCCCACCAGCGC G without theAGATCCAGGGTTGCCAGAAGACGAGACAACCGT underlinedGATTGCATGTGCGGAGGTTCCTCGATGGAAGCG exon exclusionCAGCCCGGCGCGCCCCTCAGCTGGCCTGGCCAG sequence isGCCCTATGAAGGTCACGCGAAAACCCTGCTGCG SEQ ID NO:GGCTTCTTAGCGACCGCATTACGTGGACTAGCGG 125 GCAAGAAAAGCCTGGTCGGCGCTGCCCTCACAG30244 SLC52A2 AGGCGTCTGGCCAGGTGGCGCTCCGGGCAG GCC 62TACTTGGGTGTCCCCGCCTCTGATACCTCCCT The underlinedGCTGGAGGAAACAGCAGGAAAAGAGAACCAG exon exclusionGCAGGCAGGCAGACATCCCCACGGAGCAGCG sequence isTTGGGCCCCCAAGGTGCCTGACCCACTTCCTA SEQ ID NO:GAGTACTGAACAGTCCCAGAGTGTCACAGCT 94. GATGTGCAGGACAGCCTGGAGCTCTCACCTTThe sequence CAACACGGGGTGTACCTGAGACTTCCAGTGG without theATGAGGGTCAGCCTCTGGAGCTGTGAAAACC underlinedTGGGCCGACAGCGGAGGCAGAGCTGCACTAA exon exclusionTGTTCCCACACGAGTCCTTCCCACCCAACACC sequence isTTGGTGCAGGGAGACGGAAGGAGCCTGGAGC SEQ ID NO: CAGGGCTAGAAGAAGTCTTCACTTCCCAGGAGA 126. GCCAAAGCGTGTCTGGCCCTAGGTGGGAAAAGAACTGGCTGTGACCTTTGCCCTGACCTGGAAGGGC CCAGCCTTGGGCTGAATGGCAGCACCCACGCCCGCCCGTCCGGTGCTGACCCACCTGCTGGTGGCTC TCTTCGGCATGGGCTCCTGGGCTGCGGTCAATGGGATCTGGGTGGAGCTACCTGTGGTGGTCAAAGA GCTTCCAGAGG 33377 SLC38A1CTCTTTCTCTTCCTCCAGTTTCCAGTCCAGCCCTG 63TTGGCTCTCAGAATGCATCATCCTTCTCCCTGCA The underlinedGCGCTCTCACTGAACATGCTCAAGCGCAAGGAA exon exclusionCTTATAATCTTGTGTTCTCTGGATTCTGGATTTAG sequence isTAATCTGTATTAGTCTGTTCTCACACTGCTAATA SEQ ID NO:AAGAAATACCTGAGGTTGCTTCCAAGATAGCCA 95. AATAGGAACAGCTCTGGTCTGCAGCTCCCAGCAThe sequence AGATCGATGTAGAAGATGGGTGATTTCTGCATTT without theCCAACTGAGGTACCTGGTTCATCTCACTGGGACT underlinedGGTTGGACAGTGGGTGCAGCCCATGGAAGGTGA exon exclusionGCTGAAGCAAGGTGGGGCGTCACCTCACCCAGG sequence isAAGCACAAGGGGTCAGGGGATTTACCTTTCCCA SEQ ID NO:GCCAAGGGAAGCCATGACAGACTGTAACTGGAG 127.AAACGGTACACTCCTGACCAAATACTGCACTTTT CCCACAGTCTTAGCAACTGGCAGACCAGGTAATACCCTCCCGTGCCTGGCTCAGTGGGTTCCATGCC AACGGAGCCTTGCTCACTGCTAGCGCAACAGTCTAAGATCGACCTGCGACGCTGCAGCTTGATGCAG GGAGAGGCATCCAACATTGCTGAGGCTTGAGTAGCTCACAGTGTAAGCAAAGAGGCCCGGAAGCAC AAGTTGGGCAGAGCTCATCGCTGCTCAGCAGGGCCTACTGCCTCTATAGATTCCACCTCTGGAGGCA GGGCATGGCAGAAAAAAACGCAGCAGACAGCTTTTGCAGACTTAAACGTCCCTGTCTGATGGCTCTA AAGAGAGCAATGGTTCTCTCAGCATGGCATTCGAGCTCCAAGAACAGACAGACTGCCTCCCCAAGC AGGTCCCTGACCCCCATGTAGCTGGACTGGGAAACACCTCCCCATCAGGGGCTGAGAGATACCTCA AACACGTGGGTGCCCCTCTGGGACGAAGCTTCCAGAGGAAGGATCAGGCAGCAATATTTGCTATTC TGCAGCCTTTGCTGGTGATACCCAGGCAAACAGATTCTGGAGTGGACCTCCAGCAAACTCCAACAA ACCTGCAGCTGAGGGGTCTGACTGTGGGAAGGAAAACTAACAAAGAGAAAGCAATAGCATCAACAT CAACAAAAAGGACATCCACACCAAATCCCCATCTATAGGTCACCAACATCAAAGACCAAAGGTAGA TAAAACCACAAAGATGGGGAGAGAAACCAGAGCAGAAAAGCTGAAAATTCCAAAAAACAAGCACC TCTTCTCCTCCAAAGGATCGCAGCTCCTTGCCAGCAAGGGAACAAAACTAGACGGAGAATGAGTTTG ACAAGTTGACAGAAGTAGGCTTCAGAAGGTTGGTAATAACAAACTTCTCTGAGCTAAAGGAGCATCT TCTAACCCATCGCAAAGAGGCTAAAAACTGTGGAAAAAAAAAAGGTTAGATGAATGGCTAACTAGA ATAACCAGTGTAGAGAAGACCTCAAATGACCTGATGAAGCTGAAACCCACAGCACAAGAACTTCGA GACTCATGCACAAGCTTCAATAGCCGATTCGATCAAGTGGAAGAAAGGATATCAGTGATTGAAGATC AAATTAATGAAATAAAGTGAGAAGAAT GTCTGGTGAAGTTCAAGGGCATCTTGAACGTGGTGC ACTTGGAGACAGTGAGGGAAGCAGGGGTGAAGTGGCTGCTACCTGAGTCCCTTCTGGAGCTCC ATTTTGCTTGGTCTTGGAGAAGGCTTCTCAGCTGCCCTCCCAGCTAGT GAGTTACATCTGCTAAC ATGCTTATTTTCATTCTTCCTTCATCTCTTTATTTAAAAATCACAGACCAGGATGGAGATAAAGGAACT CAAAGAATTTGG 40521 FAM65AAAACTGGGCACATTTGGGCCCCTGCGCTGCCAG 64 GAGGCATGGGCCCTGGAGCGGCTGCTGCGGGAAThe underlined GCCCGAGTACTGGAGGCAGTATGCGAGTTCAGC exon exclusionAGGCGGTGGGAGATCCCGGCCAGCTCTGCCCAG sequence is GAAGTGGTGCAGTTCTCGGCCTCTCGGCCTG SEQ ID NO: GCTTCCTGACCTTCTGGGACCAGTGCACAGA96. GAGACTCAGCTGCTTCCTCTGCCCGGTGGAG The sequenceCGGGTGCTTCTCACCTTCTGCAACCAGTATGG without theTGCCCGCCTCTCCCTGCGCCAGCCAGGCTTG underlined GCTGAGGCTGTGTGTGTGAAGTTCCTGGAGGAT exon exclusion GCCCTGGGGCAGAAGCTGCCCAGAAGGCCCCAGsequence is CCAGGGCCTGGAGAGCAGCTCACAGTCTTCCAG SEQ ID NO:TTCTGGAGTTTTGTGGAAACCTTGGACAGCCCCA 128.CCATGGAGGCCTACGTGACTGAGACCGCTGAGG AGG 41168 USP25TAATGGAAACTTGGAATTAGCAGTGGCTTTCCTT 65 ACTGCGAAGAATGCTAAGACCCCTCAGCAGGAGThe underlined GAGACAACTTACTACCAAACAGCACTTCCTGGC exon exclusionAATGATAGATACATCAGTGTGGGAAGCCAAGCA sequence is GATACAAATGTGATTGATCTCACTGGAGATGA SEQ ID NO: TAAAGATGATCTTCAGAGAGCAATTGCCTTGA97. GTTTGGCCGAATCAAACAGGGCATTCAGGGA The sequenceGACTGGAATAACTGATGAGGAACAAGCCATT without the AGCAGAGTTCTTGAAGCCAGCATAGCAGAGAAT underlinedAAAGCATGTTTGAAGAGGACACCTACAGAAGTT exon exclusionTGGAGGGATTCTCGAAACCCTTATGATAGAAAA sequence isAGACAGGACAAAGCTCCCGTTGGGCTAAAGAAT SEQ ID NO:GTTGGCAATACTTGTTGGTTTAGTGCTGTTATTC 129. AG 45885 HMOX2AACCGGATGCTACGGGTGATGACTGGGAGGAGG 66 AGAAAAATTACCTCTTTATCTTGCATGAACATCTThe underlined TAATTTTCAG AGTCTTGCTGCGACACCCAGGC exon exclusionTGGAGTGCAATGGCGCTATCTCGGCTCACTG sequence isCAACCTCCGCTTCCCGGATTCAAGCGATTCTC SEQ ID NO:CTGCCTCAGCCTCCCGAGTAGGTGGGACTAC 98. AG GACCAGAGGAGCGAGAGCAGCAAGAACCACThe sequence ACCCAGCAGCAATGTCAGCGGAAGTGGAAACCT without theCAGAGGGGGTAGACGAGTCAGAAAAAAAGAAC underlinedTCTGGGGCCCTAGAAAAGGAGAACCAAATGAG exon exclusion sequence is SEQ ID NO:130. 50148 MKRN2OS GGGTTGTGTATAATTACAGTGCACATGGTGTCCA 67GCGAGACGGAGAAGGGTGGGAAGAGAGCATAA The underlinedGCATCCCATTACTGCAGCCCAACATGTATGGAAT exon exclusionGATGGAGCAATGGGACAAGTACCTGGAAGACTT sequence isCTCCACCTCGGGGGCCTGGCTGCCTCACAG AGA SEQ ID NO:GTATGATGGAAGGTCTGATCTTCATGTTGGAA 99. TAACTAACACAAATG GTATAATGAGGAAAAGGThe sequence AAGTCTCCGGAAACCTCCCCTAGCATTCCAGGA without theGGCGAAAGCTATGCACTGCGCAGAGGCTGGGAA underlinedGGCTTTAATTAAATTCAACCACTGTGAGAAATAC exon exclusionATCTACAGCTTCAGTGTGCCCCAGTGCTGCCCTC sequence isTCTGCCAGCAGGACCTGGGCTCGAGGAAGCTGG SEQ ID NO:AGGACGCACCTGTTAGCATCGCTAATCCATTTAC 131.TAATGGACATCAAGAAAAATGTTCATTCCTCCTC AGACCAACTCAGGGGACATTTCTTAG 52249ATP8A2P1 GTAAACAAATTGCTCCTGTGGAGATGATTGGCAT 68CACATGGTGTTTTGAGCTGATACACCCAACACTT The underlined GAGCTCACTGCAACAGTACCAGATTTTCACCGC exon exclusion TATGCCTCCTTTCACTCTGGGAGTCTTCCAGA sequence isGGTCTTGCACTCGGGAGAGCATGCTCAGGTT SEQ ID NO:TCCCCAGCTCTACAAAATCACCCAGAATGCCA 100. AAGACTTCAACACAAGGGTAAATAAGGTTGATThe sequence CTCAGAATTGTCACCTCAAAAAGGCCCTGCCT without theTCCACTGTTCAGTTCTGGTCATCTGCCTATGA underlinedGATATCTGAAGCTTGAAAGAGAACACTTGAAA exon exclusionATCACTGAGACCGTGACTCCCATCCCAGCACA sequence is CACAGCAAGCCAAATACTGTGTTGACCAGTGGT SEQ ID NO: CATGCCACTGCCTGTTGATTTGTTGAAAATATTG 132.TTTACACG 53188 HIBCH TTTTAATTGATAAAGACCAGAGTCCAAAATGGA 69AACCAGCTGATCTAAAAGAAGTTACTGAGGAAG The underlinedATTTGAATAATCACTTTAAGTCTTTGGGAAGCAG exon exclusionTGATTTGAAATTTTGAGGTGACAGGCTTTTAAGG sequence isTATATTTTGTAGCATGGGTTGGCAATCTACAGCA SEQ ID NO:TGTGGGCCAAATCCAGCCTGCTGCCTGTTTTTAT 101.ATACCCTGTAAGCTAAGAATGGTTTCCGCATTTT The sequenceTAAATGGTTGGGAAAAGAAATCAAAGACTAATA without theATTCATGACGTGAAAATTATCAGAATTCACAAAT underlinedAAAGCTTTATTGGAACTAGCTATACTCATCTGTT exon exclusionTATATATTATCTGTGGCTGCTTTGAAATGAGTAG sequence isTTGCAATAGAGATGGTAAAGCCTACAAAGCCTA SEQ ID NO:ATTATTTACTGTCTGGTTTTTGTCAGAAAAAAGT 133.TTGTCAATCCTTGTTTTAGAAGATGGAAAAATGT GAAGATCTTTGGAGATTCTCTTGAGTGGTATATCTAATTGAAATGGGATCTTCGTTTGGCTTGTATGT TGATGAAATCAACTTAGGTATACAATATAAAAAATAAAGACCCTGAAAATTGTTTTGG AGAGGTCA TGACTTTCATGAAGGCGTTAGAGCTG GTAATTAATAAAATGTCTCCAACATCTCTAAAGATCACAC TAAGGCAACTCATGGAGGGGTCTTCAAAGACCTTGCAAGAAGTACTAACTATGGAGTATCGGCTAA GTCAAGCTTGTATG 58853 SLC35C2CGCGCGGCACTGGTCCTGGTGGTCCTCCTCATCG 70 CCGGGGGTCTCTTCATGTTCACCTACAAGTCCACThe underlined ACAGTTCAACGTGGAGGGCTTCGCCTTGGTGCTG exon exclusionGGGGCCTCGTTCATCGGTGGCATTCGCTGGACCC sequence isTCACCCAGATGCTCCTGCAGAAGGCTGAACTCG SEQ ID NO:GACCAAATCCTCAGCTGTCCTCTTCATCTTGA 102. TCTTCTCTCTGATCTTCAAGCTGGAGGAGCTGThe sequence CTCTGGCGACGGCGCTTGACGTGGGCTTGTCCAA without theCTGGAGCTTCCTGTATGTCACCGTCTCGCT underlined exon exclusion sequence isSEQ ID NO: 134. 59314 TRIM5 GGATCTGTGAACAAGAGGAACCTCAGCAGCCAG 71GACAGGCAGGAGCAGTGGAATAGCTACTATGGC The underlinedTTCTGGAATCCTGGTTAATGTAAAGGAGGAGGT exon exclusionGACCTGCCCCATCTGCCTGGAACTCCTGACACAA sequence isCCCCTGAGCCTGGACTGCGGCCACAGCTTCTGCC SEQ ID NO:AAGCATGCCTCACTGCAAACCACAAGAAGTCCA 103. TGCTAGACAAAGGAGAGAGTAGCTGCCCTGTGTThe sequence GCCGGATCAGTTACCAGCCTGAGAACATACGGC without theCTAATCGGCATGTAGCCAACATAGTGGAGAAGC underlinedTCAGGGAGGTCAAGTTGAGCCCAGAGGGGCAGA exon exclusionAAGTTGATCATTGTGCACGCCATGGAGAGAAAC sequence isTTCTACTCTTCTGTCAGGAGGACGGGAAGGTCAT SEQ ID NO:TTGCTGGCTTTGTGAGCGGTCTCAGGAGCACCGT 135.GGTCACCACACGTTCCTCACAGAGGAGGTTGCC CGGGAGTACCAA GATCCAGGCAATCTTTCCAGACACATCTACTTCCCAGTAATATTTCCCCGAA GAGAAATATTGGCAGCCGAAGACACCAAAAGCAGAAAAATCACATGGATTTGAATTCTTAAAT GTGCAGCAG GTCTAAGGCCCGCCTGTTCTGTGCCGTGACCTGTGCTACCGAAGTCATCTGTTGCTGT AGGGAGGCCAGGGACTCAGCCGATGCCTCAATGGCCAACTGCAG 60239 HSD17B6 TCCTCGCCTCCATCACCTCCACCGTAGTTGAGCC 72AGCGATAGTACTGAGAGTAGGGAAAGAGCCTCC The underlinedGGTAATAAAGTTTAAGCAGCTCGGGCAGCTCGG exon exclusionTGGGGTCAAACGTCTCCATTGAGCGCGGAACTC sequence is GCCACGTAACAGATCTGATTCTGCAGCTGATC SEQ ID NO: AAGGATGACACTGGTGAGAACCCTATGAGGG104. AGTGAAGCAGCCTGGACTCTTACCACAAGAG The sequenceGGAGGTGTTATAAGAGCAATGCAGAGGTTGG without theAGTGGGCAGCAGTTGGGGCAGGAGGAAGCCG underlinedACTGCTGCCTGGTCTGCAAAGAAGTCCTTTCA exon exclusionAGTCTCTAGGACTGGACTCTTCCTAAGCAAGT sequence is CCGAGAAGGAAGCACCCTCACTATGTGGCTCTA SEQ ID NO:CCTGGCGGCCTTCGTGGGCCTGTACTACCTTCTG 136.CACTGGTACCGGGAGAGGCAGGTGGTGAGCCAC CTCCAAGACAAGTATGTCTTTATCACGGGCTGTGACTCGGGCTTTGGGAACCTGCTGGCCAGACAGC TGGATGCACGAGGCTTGAGAGTGCTGGCTGCGTGTCTGACGGAGAAGGGGGCCGAGCAGCTGAGGG GCCAGACGTCTGACAGGCTGGAGACGGTGACCCTGGATGTTACCAAGATGGAGAGCATCGCTGCAG CTACTCAGTGGGTGAAGGAGCATGTGGGGGACA GAG

All references, patents and patent applications disclosed herein areincorporated by reference with respect to the subject matter for whicheach is cited, which in some cases may encompass the entirety of thedocument.

The indefinite articles “a” and “an,” as used herein in thespecification and in the claims, unless clearly indicated to thecontrary, should be understood to mean “at least one.”

It should also be understood that, unless clearly indicated to thecontrary, in any methods claimed herein that include more than one stepor act, the order of the steps or acts of the method is not necessarilylimited to the order in which the steps or acts of the method arerecited.

In the claims, as well as in the specification above, all transitionalphrases such as “comprising,” “including,” “carrying,” “having,”“containing,” “involving,” “holding,” “composed of,” and the like are tobe understood to be open-ended, i.e., to mean including but not limitedto. Only the transitional phrases “consisting of” and “consistingessentially of” shall be closed or semi-closed transitional phrases,respectively, as set forth in the United States Patent Office Manual ofPatent Examining Procedures, Section 2111.03.

The terms “about” and “substantially” preceding a numerical value mean±10% of the recited numerical value.

Where a range of values is provided, each value between the upper andlower ends of the range are specifically contemplated and describedherein.

1. A method comprising assaying nucleic acids of a sample for the presence or absence of: a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104; (b) at least 2 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104; (c) at least 3 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104; or (d) at least 8 different target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104.
 2. The method of claim 1, wherein the target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 102, or
 104. 3. (canceled)
 4. The method of claim 1, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 101, 102, or
 104. 5.-6. (canceled)
 7. The method of claim 1, wherein the sample is a breast tissue sample.
 8. The method of claim 7, wherein the sample is obtained from a subject suspect of having, at risk of, or diagnosed with breast cancer.
 9. The method of claim 8, wherein the subject is a female subject.
 10. The method of claim 1 any one of claim 1, wherein the nucleic acids comprise messenger ribonucleic acid (mRNA).
 11. The method of claim 1, wherein the nucleic acids comprise complementary deoxyribonucleic acid (cDNA) synthesized from mRNA obtained from the sample.
 12. The method of claim 1, further comprising detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 24, 28, 31, 33, and/or 38 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 82, 87 and/or 91, and assigning a favorable survival prognosis to the sample.
 13. The method of claim 1, further comprising detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 21-23, 25-27, 29, 30, 32, and/or 34-40 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 73-81, 83-86, 88-90, and/or 92-104, and assigning an unfavorable survival prognosis to the sample.
 14. A complementary deoxyribonucleic acid (cDNA) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136.
 15. The cDNA of claim 14 comprising a nucleotide sequence of any one of SEQ ID NOs: 22-24, 27-34, 36, 38, or
 40. 16. A composition comprising the cDNA of claim
 14. 17. The composition of claim 16 further comprising a probe that binds the cDNA or a pair of primers that bind the cDNA.
 18. (canceled)
 19. A composition comprising (a) a messenger ribonucleic acid (mRNA) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136 and (b) a probe that binds a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136 or a pair of primers that bind a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136.
 20. The composition of claim 17, wherein the probe comprises a detectable label or the primers comprise a detectable label.
 21. (canceled)
 22. The composition of claim 19, wherein the probe comprises a detectable label or the primers comprise a detectable label.
 23. A kit comprising: a molecule that can detect the presence or absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104, and a detection reagent selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs), molecules that can detect the presence or absence of at least 2 target exons, wherein each of the at least 2 target exons comprises a nucleotide sequence of any one of SEQ ID NOs: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs, molecules that can detect the presence or absence of at least 3 target exons, wherein each of the at least 3 target exons comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs, or molecules that can detect the presence or absence of at least 8 different target exons, wherein each of the at least 8 target exons comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs.
 24. The kit of claim 23, wherein the molecule comprise a probe or primer that bind a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104. 25-27. (canceled)
 28. The kit of claim 24, wherein the probe or primer comprises a detectable label. 